HACKATHON: Evals November 2023 (2)

05/02/2024 48 min Episodio 11

Listen "HACKATHON: Evals November 2023 (2)"

Descargar episodio Ver en sitio original

Episode Synopsis

Join our hackathon group for the second episode in the Evals November 2023 Hackathon subseries. In this episode, we solidify our goals for the hackathon after some preliminary experimentation and ideation.Check out Stellaric's website, or follow them on Twitter.01:53 - Meeting starts05:05 - Pitch: extension of locked models23:23 - Pitch: retroactive holdout datasets34:04 - Preliminary results37:44 - Next steps42:55 - RecapLinks to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.Evalugator libraryPassword Locked Model blogpostTruthfulQA: Measuring How Models Mimic Human FalsehoodsBLEU: a Method for Automatic Evaluation of Machine TranslationBoolQ: Exploring the Surprising Difficulty of Natural Yes/No QuestionsDetecting Pretraining Data from Large Language Models

More episodes of the podcast Into AI Safety

Sobering Up on AI Progress w/ Dr. Sean McGregor 29/12/2025

Against 'The Singularity' w/ Dr. David Thorstad 24/11/2025

Getting Agentic w/ Alistair Lowe-Norris 20/10/2025

Growing BlueDot's Impact w/ Li-Lian Ang 15/09/2025

Layoffs to Leadership w/ Andres Sepulveda Morales 04/08/2025

Getting Into PauseAI w/ Will Petillo 23/06/2025

Making Your Voice Heard w/ Tristan Williams & Felix de Simone 19/05/2025

INTERVIEW: Scaling Democracy w/ (Dr.) Igor Krawczuk 03/06/2024

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (3) 25/03/2024

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (2) 18/03/2024

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

HACKATHON: Evals November 2023 (2)

Listen "HACKATHON: Evals November 2023 (2)"

Episode Synopsis

More episodes of the podcast Into AI Safety

Deep web or Invisible Internet

WWW. Is it obsolete or not? Should we use it?

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD