Multi-Task Language Understanding 📈 // Composable Interventions 🤝 // ARMT Sets Performance Record 💪

10/07/2024 14 min

Listen "Multi-Task Language Understanding 📈 // Composable Interventions 🤝 // ARMT Sets Performance Record 💪"

Episode Synopsis

The MNLU-Pro dataset is a more robust and challenging massive multi-task language understanding dataset that's tailored to more rigorously benchmark large language models' capabilities.
The Composable Interventions framework allows researchers to study the effects of using multiple interventions on a language model, and the order in which interventions are applied can have a significant impact on their effectiveness.
The MJ-Bench benchmark evaluates the effectiveness of different types of multimodal judges in providing feedback for text-to-image generation models, and the experiments reveal that close-source VLMs generally provide better feedback.
The Associative Recurrent Memory Transformer (ARMT) is an approach that combines transformer self-attention for local context with segment-level recurrence for storage of task-specific information distributed over a long context, and it sets a new performance record in the recent BABILong multi-task long-context benchmark.
Contact:  [email protected]
Timestamps:
00:34 Introduction
01:32 MNLU-Pro Release on HuggingFace Datasets
03:48 Extrinsic Hallucinations in LLMs
04:53 RouteLLM
06:13 Fake sponsor
08:14 Composable Interventions for Language Models
09:45 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
11:31 Associative Recurrent Memory Transformer
13:30 Outro

More episodes of the podcast GPT Reviews