Listen "Multi-Task Language Understanding 📈 // Composable Interventions 🤝 // ARMT Sets Performance Record 💪"
Episode Synopsis
The MNLU-Pro dataset is a more robust and challenging massive multi-task language understanding dataset that's tailored to more rigorously benchmark large language models' capabilities.
The Composable Interventions framework allows researchers to study the effects of using multiple interventions on a language model, and the order in which interventions are applied can have a significant impact on their effectiveness.
The MJ-Bench benchmark evaluates the effectiveness of different types of multimodal judges in providing feedback for text-to-image generation models, and the experiments reveal that close-source VLMs generally provide better feedback.
The Associative Recurrent Memory Transformer (ARMT) is an approach that combines transformer self-attention for local context with segment-level recurrence for storage of task-specific information distributed over a long context, and it sets a new performance record in the recent BABILong multi-task long-context benchmark.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:32 MNLU-Pro Release on HuggingFace Datasets
03:48 Extrinsic Hallucinations in LLMs
04:53 RouteLLM
06:13 Fake sponsor
08:14 Composable Interventions for Language Models
09:45 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
11:31 Associative Recurrent Memory Transformer
13:30 Outro
The Composable Interventions framework allows researchers to study the effects of using multiple interventions on a language model, and the order in which interventions are applied can have a significant impact on their effectiveness.
The MJ-Bench benchmark evaluates the effectiveness of different types of multimodal judges in providing feedback for text-to-image generation models, and the experiments reveal that close-source VLMs generally provide better feedback.
The Associative Recurrent Memory Transformer (ARMT) is an approach that combines transformer self-attention for local context with segment-level recurrence for storage of task-specific information distributed over a long context, and it sets a new performance record in the recent BABILong multi-task long-context benchmark.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:32 MNLU-Pro Release on HuggingFace Datasets
03:48 Extrinsic Hallucinations in LLMs
04:53 RouteLLM
06:13 Fake sponsor
08:14 Composable Interventions for Language Models
09:45 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
11:31 Associative Recurrent Memory Transformer
13:30 Outro
More episodes of the podcast GPT Reviews
OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨
28/08/2024
Grok-2's Speed & Accuracy 🚀 // OpenAI's Transparency Push 🗳️ // LlamaDuo for Local LLMs 🔄
27/08/2024
Amazon Cloud Chief Spicy Takes 🚀 // Zuckerberg's AI Vision 📈 // Multimodal Models for Safety 🔒
23/08/2024
Grok-2 Beta Release 🚀 // Apple's $1,000 Home Robot 🏡 // ChemVLM Breakthrough in Chemistry 🔬
15/08/2024
Gemini Live AI Assistant 📱 // OpenAI’s Coding Benchmark ✅ // LongWriter’s 10K Word Generation ✍️
14/08/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.