Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

16/11/2024 11 min

Listen "Ep48. Large Language Models Can Self-Improve in Long-context Reasoning"

Descargar episodio Ver en sitio original

Episode Synopsis

This research paper investigates how large language models (LLMs) can improve their ability to reason over long contexts. The authors propose a self-improvement method called SEALONG that involves sampling multiple reasoning outputs from an LLM, scoring these outputs using Minimum Bayes Risk (MBR), and then fine-tuning the model using the highest-scoring outputs or by contrasting high-scoring and low-scoring outputs for preference optimization. Extensive experiments on several leading LLMs demonstrate that SEALONG effectively improves the long-context reasoning capabilities of LLMs without relying on human annotations or advanced models. The paper further analyzes the impact of various prompting strategies, scoring methods, and training parameters on SEALONG's performance.

More episodes of the podcast The Daily ML

Ep49. Artificial Intelligence, Scientific Discovery, and Product Innovation 18/11/2024

Ep47. Personalization of Large Language Models: A Survey 16/11/2024

Ep46. Number Cookbook: Number Understanding of Language Models and How to Improve It 14/11/2024

Ep45. Multi-expert Prompting Improves Reliability, Safety and Usefulness of Large Language Models 12/11/2024

Ep44. Mixtures of In-Context Learners 11/11/2024

Ep43. Project Sid: Many-agent simulations toward AI civilization 10/11/2024

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure 09/11/2024

Ep41. Distinguishing Ignorance from Error in LLM Hallucinations 08/11/2024

Ep40. A Comprehensive Survey of Small Language Models in the Era of Large Language Models 07/11/2024

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Ep48. Large Language Models Can Self-Improve in Long-context Reasoning

Listen "Ep48. Large Language Models Can Self-Improve in Long-context Reasoning"

Episode Synopsis

More episodes of the podcast The Daily ML

CAPTCHA for human verification!

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD