Listen "Deep Dive into Inference Optimization for LLMs with Philip Kiely"
Episode Synopsis
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads.
We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.
We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.
More episodes of the podcast Software Huddle
Powered by Neurons with Ewelina Kurtys
16/09/2025
It's time to build Jarvis with Kent C. Dodds
13/05/2025
Software Reliability Agents with Amal Kiran
29/04/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.