Listen "Solving the Cold Start Problem in AI Inference"
Episode Synopsis
In this episode of Inference Time Tactics, Rob, Cooper, and Byron sit down with Prashanth Velidandi, co-founder of InferX, to explore how serverless inference is tackling the AI “cold start problem.” They dig into why 90% of the model lifecycle happens at inference—not training—and how cold starts and idle GPUs are crippling efficiency. Prashanth explains InferX’s snapshot technology, what it takes to deliver sub-second cold starts, and why inference infrastructure—not just models—will define the next era of AI.
We talked about:
Why inference represents 90% of the model lifecycle, compared to the training focus most of the industry has.
How cold starts and idle GPUs create massive inefficiencies in AI infrastructure.
InferX’s snapshot technology that enables sub-second model loading and higher GPU utilization.
The challenges of explaining and selling deeply technical infrastructure to the market.
Why enterprises care about inference efficiency, cost, and reliability more than model size.
How serverless inference abstracts away infrastructure complexity for developers.
The coming explosion of multi-agent systems and billions of specialized models.
Why sustainable innovation in AI will come from inference infrastructure.
Connect with InferX
Prashanth Velidandi
https://inferx.net
https://x.com/pmv_inferx
https://www.linkedin.com/in/prashanth-velidandi-98629b115
Connect with Neurometric:
Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
X: https://x.com/neurometric/
Bluesky: https://bsky.app/profile/neurometric.bsky.social
Rob May
https://x.com/robmay
https://www.linkedin.com/in/robmay
Calvin Cooper
https://x.com/cooper_nyc_
https://www.linkedin.com/in/coopernyc
Byron Galbraith
https://x.com/bgalbraith
https://www.linkedin.com/in/byrongalbraith
We talked about:
Why inference represents 90% of the model lifecycle, compared to the training focus most of the industry has.
How cold starts and idle GPUs create massive inefficiencies in AI infrastructure.
InferX’s snapshot technology that enables sub-second model loading and higher GPU utilization.
The challenges of explaining and selling deeply technical infrastructure to the market.
Why enterprises care about inference efficiency, cost, and reliability more than model size.
How serverless inference abstracts away infrastructure complexity for developers.
The coming explosion of multi-agent systems and billions of specialized models.
Why sustainable innovation in AI will come from inference infrastructure.
Connect with InferX
Prashanth Velidandi
https://inferx.net
https://x.com/pmv_inferx
https://www.linkedin.com/in/prashanth-velidandi-98629b115
Connect with Neurometric:
Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
X: https://x.com/neurometric/
Bluesky: https://bsky.app/profile/neurometric.bsky.social
Rob May
https://x.com/robmay
https://www.linkedin.com/in/robmay
Calvin Cooper
https://x.com/cooper_nyc_
https://www.linkedin.com/in/coopernyc
Byron Galbraith
https://x.com/bgalbraith
https://www.linkedin.com/in/byrongalbraith
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.