Listen "#80- Layer pruning and Mixture of Depths."
Episode Synopsis
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.
Paper MoD: https://arxiv.org/pdf/2404.02258.pdf
Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.
Paper MoD: https://arxiv.org/pdf/2404.02258.pdf
Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai
More episodes of the podcast Life with AI
#99- GraphRAG.
05/12/2024
#98- On-device AI with SmolLM.
07/11/2024
#96- Maritaca AI, the brazilian LLM company.
24/10/2024
#95- Why Chain of Thought works?
26/09/2024
#94- OpenAI o1
19/09/2024
#93- Different types of AI.
12/09/2024
#92- Llama3 benchmarks, vision and speech.
22/08/2024
#91- Llama 3 training.
15/08/2024
#90- Llama 3 paper overview.
25/07/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.