Will We Get Alignment by Default? — with Adrià Garriga-Alonso

28/11/2025 1 min

Listen "Will We Get Alignment by Default? — with Adrià Garriga-Alonso"

Descargar episodio Ver en sitio original

Episode Synopsis

This is a link post. Adrià recently published “Alignment will happen by default; what's next?” on LessWrong, arguing that AI alignment is turning out easier than expected. Simon left a lengthy comment pushing back, and that sparked this spontaneous debate. Adrià argues that current models like Claude Opus 3 are genuinely good “to their core,” and that an iterative process — where each AI generation helps align the next — could carry us safely to superintelligence. Simon counters that we may only get one shot at alignment, that current methods are too weak to scale. A conversation about where AI safety actually stands. ---
First published:
November 27th, 2025

Source:
https://www.lesswrong.com/posts/zgWhJa9cMCAE9ysoB/will-we-get-alignment-by-default-with-adria-garriga-alonso

Linkpost URL:https://simonlermen.substack.com/p/will-we-get-alignment-by-default
---
Narrated by TYPE III AUDIO.

More episodes of the podcast LessWrong (30+ Karma)

“When does competition lead to recognisable values?” by Jan_Kulveit, beren, David Duvenaud, Raymond Douglas 13/01/2026

“Lies, Damned Lies, and Proofs: Formal Methods are not Slopless” by Quinn, Max von Hippel 13/01/2026

“Brief Explorations in LLM Value Rankings” by Tim Hua, Josh Engels, Neel Nanda, Senthooran Rajamanoharan 13/01/2026

“Tensor-Transformer Variants are Surprisingly Performant” by Logan Riggs 12/01/2026

“Thinking vs Unfolding” by Chris Scammell 12/01/2026

“Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities (Research Report)” by Florian_Dietz 12/01/2026

“Announcing Inkhaven 2: April 2026” by Ben Pace 12/01/2026

“Digital intentionality is not about productivity” by mingyuan 12/01/2026

“If AI alignment is only as hard as building the steam engine, then we likely still die” by MichaelDickens 12/01/2026

“A Couple Useful LessWrong Userstyles” by Alex Vermillion 12/01/2026

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

Will We Get Alignment by Default? — with Adrià Garriga-Alonso

Listen "Will We Get Alignment by Default? — with Adrià Garriga-Alonso"

Episode Synopsis

More episodes of the podcast LessWrong (30+ Karma)

Free Internet, a prediction in Nostradamus style

Subdomains, a glance with the experts!

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD