213 – Are Transformer Models Aligned By Default?

29/05/2024

Listen "213 – Are Transformer Models Aligned By Default?"

Descargar episodio Ver en sitio original

Episode Synopsis

Our species has begun to scrute the inscrutable shoggoth! With Matt Freeman
LINKS
Anthropic’s latest AI Safety research paper, on interpretability
Anthropic is hiring
Episode 93 of The Mind Killer
Talkin’ Fallout
VibeCamp

0:00:17 – A Layman’s AI Refresher
0:21:06 – Aligned By Default
0:50:56 – Highlights from Anthropic’s Latest Interpretability Paper
1:26:47 – Guild of the Rose Update
1:29:40 – Going to VibeCamp
1:37:05 – Feedback
1:43:58 – Less Wrong Posts
1:57:30 – Thank the Patron

Our Patreon, or if you prefer Our SubStack
Hey look, we have a discord! What could possibly go wrong?
We now partner with The Guild of the Rose, check them out.

Rationality: From AI to Zombies, The Podcast

LessWrong Sequence Posts Discussed in this Episode:
If You Demand Magic, Magic Won’t Help
The Beauty of Settled Science
Next Sequence Posts:
Is Humanism A Religion-Substitute?
Scarcity

More episodes of the podcast The Bayesian Conspiracy

252 – The 12 Virtues of Rationality, with Alex and David 10/12/2025

251 – Matt Freeman on What Makes a Good Story 26/11/2025

Bayes Blast 46 – Get Involved in Local Politics with Booker Lightman 22/11/2025

250 – Making the Good Life with Matt Freeman 12/11/2025

249 – Red Heart and IABIED with Max Harms 29/10/2025

248 – Seeking Alpha, with Jay 15/10/2025

247 – The Void, with Matt Freeman, part 2 01/10/2025

246 – The Void, with Matt Freeman, part 1 17/09/2025

pre246 note 17/09/2025

245 – AI Welfare, with Rob Long and Rosie Campbell of Eleos 03/09/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

213 – Are Transformer Models Aligned By Default?

Listen "213 – Are Transformer Models Aligned By Default?"

Episode Synopsis

More episodes of the podcast The Bayesian Conspiracy

Orthographic errors in Web pages

Dot COM: The Internet’s dominant TLD

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD