Listen "Ep.10 Are benchmarks broken?"
Episode Synopsis
                            In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative.
0:25 - Technical wrap: what are agents?
13:20 - What are benchmarks?
18:20 - Automated evaluation
20:10 - Benchmarks
37:45 - Human feedback
44:50 - LLM as judge
about the projects we discuss here:
Meditron
Learn about the MOOVE or contact our team if you'd like to be involved
Listen to the LiGHTCAST including their recent excellent outline of the HealthBench paper
More details in the show notes on our website.
Episodes | Bluesky | [email protected]
                        
                    0:25 - Technical wrap: what are agents?
13:20 - What are benchmarks?
18:20 - Automated evaluation
20:10 - Benchmarks
37:45 - Human feedback
44:50 - LLM as judge
about the projects we discuss here:
Meditron
Learn about the MOOVE or contact our team if you'd like to be involved
Listen to the LiGHTCAST including their recent excellent outline of the HealthBench paper
More details in the show notes on our website.
Episodes | Bluesky | [email protected]
More episodes of the podcast Medical Attention
                                
                                
                                    In-context: September 4, 2025                                
                                                                    04/09/2025
                                                            
                                                    
                                
                                
                                    In-context: August 18, 2025                                
                                                                    18/08/2025
                                                            
                                                    
                                
                                
                                    In-context: July 20, 2025                                
                                                                    20/07/2025
                                                            
                                                    
                                
                                
                                    In-context: June 9, 2025                                
                                                                    10/06/2025
                                                            
                                                    
                                
                                
                                    In-context: May 2025                                
                                                                    27/05/2025
                                                            
                                                    
                                
                                
                                    Ep.9 AI Mythbusting                                
                                                                    10/05/2025
                                                            
                                                    
                                
                                
                                    Ep.8 Algorithmic Bias                                
                                                                    17/01/2025
                                                            
                                                    
                                
                                
                                    Ep.7 Informatics Year in Review                                
                                                                    17/12/2024
                                                            
                                                    
                                             ZARZA We are Zarza, the prestigious firm behind major projects in information technology.
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.
				 
                 In God we trust
 In God we trust