Listen "S02E06 - Can LLMs (Large Language Models) really reason?"
Episode Synopsis
In this episode, Anna and Aiden discuss whether LLMs (Large Language Models) are good at reasoning? Or, are they force-fit to pass certain well-known benchmarks?
The material for this episode comes from two research studies. They are:
1. GSM-Symbolic: Understanding the Limitations of
Mathematical Reasoning in Large Language Models by
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi
Oncel Tuzel, Samy Bengio and Mehrdad Farajtabar working at Apple
2. Functional Benchmarks for Robust Evaluation of
Reasoning Performance, and the Reasoning Gap by
Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar,
Adwaith Samod T, Alan Philipose, Stevin Prince, and Sooraj Thomas from Consequent AI
The material for this episode comes from two research studies. They are:
1. GSM-Symbolic: Understanding the Limitations of
Mathematical Reasoning in Large Language Models by
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi
Oncel Tuzel, Samy Bengio and Mehrdad Farajtabar working at Apple
2. Functional Benchmarks for Robust Evaluation of
Reasoning Performance, and the Reasoning Gap by
Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar,
Adwaith Samod T, Alan Philipose, Stevin Prince, and Sooraj Thomas from Consequent AI
More episodes of the podcast Stash Talk
S03E03 - Does AI have consciousness?
27/10/2024
S03E02 - The Compiler Course Exchange
19/10/2024
S02E02 - Financial Machine Learning
07/10/2024
S02E01 - Metrics for Agile Product Teams
06/10/2024
S01E06 - Mathematics for Computer Science
05/10/2024
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.