Listen "Sam Bowman on benchmarking and AI alignment"
Episode Synopsis
Lessons learned about benchmarking, adversarial testing, the dangers of over- and under-claiming, and AI alignment.
Transcript: https://web.stanford.edu/class/cs224u/podcast/bowman/
Sam's website
Sam on Twitter
NYU Linguistics
NYU Data Science
NYU Computer Science
Anthropic
SNLI paper: A large annotated corpus for learning natural language inference
SNLI leaderboard
FraCaS
SICK
A SICK cure for the evaluation of compositional distributional semantic models
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
RTE Knowledge Resources
Richard Socher
Chris Manning
Andrew Ng
Ray Kurtzweil
SQuAD
Gabor Angeli
Adina Williams
Adina Williams podcast episode
MultiNLI paper: A broad-coverage challenge corpus for sentence understanding through inference
MultiNLI leaderboards
Twitter discussion of LLMs and negation
GLUE
SuperGLUE
DecaNLP
GPT-3 paper: Language Models are Few-Shot Learners
FLAN
Winograd schema challenges
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
JSALT: General-Purpose Sentence Representation Learning
Ellie Pavlick
Ellie Pavlick podcast episode
Tal Linzen
Ian Tenney
Dipanjan Das
Yoav Goldberg
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
Big Bench
Upwork
Surge AI
Dynabench
Douwe Kiela
Douwe Kiela podcast episode
Ethan Perez
NYU Alignment Research Group
Eliezer Shlomo Yudkowsky
Alignment Research Center
Redwood Research
Percy Liang podcast episode
Richard Socher podcast episode
More episodes of the podcast CS224U
Rishi Bommasani on Foundation Models
11/04/2022
Douwe Kiela on research at Hugging Face
18/04/2022
Omar Khattab on neural information retrieval
25/04/2022
Richard Socher on conviction in research
02/05/2022
Ellie Pavlick on true language understanding
09/05/2022
Yulia Tsvetkov on ethical NLP
16/05/2022
Maria Antoniak on cultural analytics
27/06/2022
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.