HACKATHON: Evals November 2023 (1)

08/01/2024 1h 8min Episodio 7

Listen "HACKATHON: Evals November 2023 (1)"

Descargar episodio Ver en sitio original

Episode Synopsis

This episode kicks off our first subseries, which will consist of recordings taken during my team's meetings for the AlignmentJams Evals Hackathon in November of 2023. Our team won first place, so you'll be listening to the process which, at the end of the day, turned out to be pretty good.Check out Apart Research, the group that runs the AlignmentJamz Hackathons.Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure DomainsNew paper shows truthfulness & instruction-following don't generalize by defaultGeneralization Analogies WebsiteDiscovering Language Model Behaviors with Model-Written EvaluationsModel-Written Evals WebsiteOpenAI Evals GitHubMETR (previously ARC Evals)Goodharting on WikipediaFrom Instructions to Intrinsic Human Values, a Survey of Alignment Goals for Big ModelsFine Tuning Aligned Language Models Compromises Safety Even When Users Do Not IntendShadow Alignment: The Ease of Subverting Safely Aligned Language ModelsWill Releasing the Weights of Future Large Language Models Grant Widespread Access to Pandemic Agents?Building Less Flawed Metrics, Understanding and Creating Better Measurement and Incentive SystemseLeutherAI's Model Evaluation HarnessEvalugator Library

More episodes of the podcast Into AI Safety

Sobering Up on AI Progress w/ Dr. Sean McGregor 29/12/2025

Against 'The Singularity' w/ Dr. David Thorstad 24/11/2025

Getting Agentic w/ Alistair Lowe-Norris 20/10/2025

Growing BlueDot's Impact w/ Li-Lian Ang 15/09/2025

Layoffs to Leadership w/ Andres Sepulveda Morales 04/08/2025

Getting Into PauseAI w/ Will Petillo 23/06/2025

Making Your Voice Heard w/ Tristan Williams & Felix de Simone 19/05/2025

INTERVIEW: Scaling Democracy w/ (Dr.) Igor Krawczuk 03/06/2024

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (3) 25/03/2024

INTERVIEW: StakeOut.AI w/ Dr. Peter Park (2) 18/03/2024

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

HACKATHON: Evals November 2023 (1)

Listen "HACKATHON: Evals November 2023 (1)"

Episode Synopsis

More episodes of the podcast Into AI Safety

Preparing for a Hacker Threat

Internet as human right and its scope

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Internet Predators on the prowl

Gray Hat Hacking, those with ambiguous ethics…

Dot COM: The Internet’s dominant TLD