Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

16/11/2024 11 min
Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

Listen "Ep48. Large Language Models Can Self-Improve in Long-context Reasoning"

Episode Synopsis

This research paper investigates how large language models (LLMs) can improve their ability to reason over long contexts. The authors propose a self-improvement method called SEALONG that involves sampling multiple reasoning outputs from an LLM, scoring these outputs using Minimum Bayes Risk (MBR), and then fine-tuning the model using the highest-scoring outputs or by contrasting high-scoring and low-scoring outputs for preference optimization. Extensive experiments on several leading LLMs demonstrate that SEALONG effectively improves the long-context reasoning capabilities of LLMs without relying on human annotations or advanced models. The paper further analyzes the impact of various prompting strategies, scoring methods, and training parameters on SEALONG's performance.

More episodes of the podcast The Daily ML