🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07

19/07/2025 2h 51min

Listen "🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07"

Descargar episodio Ver en sitio original

Episode Synopsis

👋🏼 Hey AI heads 🎙️Join us for the very first Tech Beats Live 🔴, hosted by Kosseila—aka @CloudDude from @CloudThrill.🎯 This chill & laid-back livestream will unpack LLM quantization 🔥:✅ WHY it matters✅ HOW it works✅ Enterprise (vLLM) vs Consumer (@Ollama) trade-offs✅ and WHERE it’s going next.We’ll be joined by two incredible guest stars to talk Enterprise vs Consumer Quantz 🗣️:🔷 Eldar Kurtić – bringing the enterprise perspective with vLLM.🔷 Colin Kealty – aka Bartowski, creator of the top-downloaded GGUF quantized LLMs on Hugging Face.🫵🏼 Come learn and have some fun 😎.𝐂𝐡𝐚𝐩𝐭𝐞𝐫𝐬:(00:00) Host Introduction(04:07) Eldar Intro(07:33) Bartowski Intro(13:04) What’s Quantization!(16:19) Why LLM Quantization Matters?(20:39) Training vs Inference – “The New Deal”(27:46) Biggest Misconception About Quantization(33:22) Enterprise Quantization in Production (vLLM)(48:48) Consumer LLMs & Quantization (Ollama, llama.cpp, GGUF) – “LLMs for the People”(01:06:45) BitNet 1-Bit Quantization from Microsoft(01:28:14) How Long It Takes to Quantize a Model (Llama-3 70B) – GGUF or lm-compressor(01:34:23) What Is I-Matrix & Why People Confuse It with IQ Quantization?(01:39:36) What’s LoRA & LoRA-Q?(01:42:36) What Is Sparsity?(01:47:42) What Is Distillation?(01:52:34) Extreme Quantization (Unsloth) of Big Models (DeepSeek) at 2-bits 70 % Size Cut(01:57:27) Will Future Models (Llama-5) Be Trained on FP4 Tensor Cores? (02:02:15) The Future of LLMs on Edge Devices (Google AI Edge)(02:08:00) How to Evaluate the Quality of a Quantized Model(02:26:09) Hugging Face’s Role in the World of LLM/Quantization(02:33:46) Hugging Face’s Role in the World of LLM/Quantization(02:36:41) LocalLlama Sub-Reddit Down (Moderator Goes Bananas)(02:40:11) Guests’ Hope for the Future of LLMs & AI in General📖 Check out the quantization blog: https://bitly/LLMQuant#AI #LLM #Quantization #TechBeatsLive #LocalLlama #vLLM #Ollama

More episodes of the podcast Tech Beats Unplugged

𝗡𝘂𝘁𝗮𝗻𝗶𝘅 𝗶𝗻 𝘁𝗵𝗲 𝗔𝗴𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗩𝗶𝗿𝘁𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗦𝗵𝗮𝗸𝗲-𝗨𝗽 (w/ Michael webster) | Ep08 23/12/2025

Ep06: "GitHub Security horror stories " (with Steve Giguere) 10/06/2025

Ep05: "Deploy Local LLMs 𝐢𝐧 the Cloud (𝟏𝟎𝟎% 𝐃𝐚𝐭𝐚 𝐏𝐫𝐢𝐯𝐚𝐜𝐲)" 24/09/2024

Ep04: "Networking in the Cloud Part 1" (with Jose Moreno) 17/04/2024

Ep03: "Journey into Tech with Kelsey Hightower (Uncut version) 🌠 28/11/2023

Ep02: "The Open Source debate part2 " (Community strikes back ✊🏻) 18/10/2023

Ep01: "The Open Source debate part1" (With Erik Benner) 24/08/2023

Podcast Intro 02/07/2023

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07

Listen "🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" | Ep07"

Episode Synopsis

More episodes of the podcast Tech Beats Unplugged

Internet Predators on the prowl

Orthographic errors in Web pages

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Internet Predators on the prowl

Gray Hat Hacking, those with ambiguous ethics…

Dot COM: The Internet’s dominant TLD