Solving the Cold Start Problem in AI Inference

03/10/2025 34 min Episodio 8
Solving the Cold Start Problem in AI Inference

Listen "Solving the Cold Start Problem in AI Inference"

Episode Synopsis

In this episode of Inference Time Tactics, Rob, Cooper, and Byron sit down with Prashanth Velidandi, co-founder of InferX, to explore how serverless inference is tackling the AI “cold start problem.” They dig into why 90% of the model lifecycle happens at inference—not training—and how cold starts and idle GPUs are crippling efficiency. Prashanth explains InferX’s snapshot technology, what it takes to deliver sub-second cold starts, and why inference infrastructure—not just models—will define the next era of AI.



We talked about:
 

Why inference represents 90% of the model lifecycle, compared to the training focus most of the industry has.
How cold starts and idle GPUs create massive inefficiencies in AI infrastructure.
InferX’s snapshot technology that enables sub-second model loading and higher GPU utilization.
The challenges of explaining and selling deeply technical infrastructure to the market.
Why enterprises care about inference efficiency, cost, and reliability more than model size.
How serverless inference abstracts away infrastructure complexity for developers.
The coming explosion of multi-agent systems and billions of specialized models.
Why sustainable innovation in AI will come from inference infrastructure.




Connect with InferX
Prashanth Velidandi
https://inferx.net 
https://x.com/pmv_inferx 
https://www.linkedin.com/in/prashanth-velidandi-98629b115



Connect with Neurometric:
Website: https://www.neurometric.ai/ 
Substack: https://neurometric.substack.com/ 
X: https://x.com/neurometric/ 
Bluesky: https://bsky.app/profile/neurometric.bsky.social
 
Rob May
https://x.com/robmay 
https://www.linkedin.com/in/robmay
 
Calvin Cooper
https://x.com/cooper_nyc_ 
https://www.linkedin.com/in/coopernyc
 
Byron Galbraith
https://x.com/bgalbraith 
https://www.linkedin.com/in/byrongalbraith