Ep 21: LLM Model Merging

01/04/2024 32 min

Listen "Ep 21: LLM Model Merging"

Episode Synopsis

AI News:
1. Databricks announces the release of a new LLM (Large Language Model) named DBRX, designed to improve copilot features in DBX.
2. The research paper "LLM4Decompile: Decompiling Binary Code with Large Language Models" [2403.05286] is released, claiming high-level accuracy in decompiling machine code to high-level language.
3. GitHub introduces an AI tool for detecting security vulnerabilities within code.
4. Stability AI experiences a CEO resignation.

Main Topic: Merging LLM Models
1. SLERP (Smoothly Large Embeddings for Representation Pooling)
2. TIES (Technically Intuitive Embedding Space)
3. Frankenmerges (A new approach to merging models)

References:

[2403.05286] LLM4Decompile: Decompiling Binary Code with Large Language Models

[2311.03099] Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

[2306.11644] Textbooks Are All You Need

[1909.11299] Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

[1511.07543] Convergent Learning: Do different neural networks learn the same representations?

An empirical analysis of compute-optimal large language model training - Google DeepMind


More episodes of the podcast Machine Learning Made Simple