"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

05/04/2023 39 min

Listen ""Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky"

Episode Synopsis

---client: lesswrongproject_id: curatedfeed_id: ai_safety narrator: pwqa: mdsqa_time: 1h00m---In late 2022, Nate Soares gave some feedback on my Cold Takes series on AI risk (shared as drafts at that point), stating that I hadn't discussed what he sees as one of the key difficulties of AI alignment.I wanted to understand the difficulty he was pointing to, so the two of us had an extended Slack exchange, and I then wrote up a summary of the exchange that we iterated on until we were both reasonably happy with its characterization of the difficulty and our disagreement.1 My short summary is:Nate thinks there are deep reasons that training an AI to do needle-moving scientific research (including alignment) would be dangerous. The overwhelmingly likely result of such a training attempt (by default, i.e., in the absence of specific countermeasures that there are currently few ideas for) would be the AI taking on a dangerous degree of convergent instrumental subgoals while not internalizing important safety/corrigibility properties enough.I think this is possible, but much less likely than Nate thinks under at least some imaginable training processes.Original article:https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficultyNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.

More episodes of the podcast TYPE III AUDIO (All episodes)

"Information security in high-impact areas career review" by Jarrah Bloomfield 23/06/2023

Summary: How to find a fulfilling career that does good 14/06/2023

The end: A cheery final note — imagining your deathbed 14/06/2023

Part 12: One of the most powerful ways to improve your career: Join a community. 14/06/2023

Part 11: All the best advice we could find on how to get a job 14/06/2023

Part 10: How to make your career plan 14/06/2023

Part 9: All the evidence-based advice we found on how to be more successful in any job 14/06/2023

Part 8: How to find the right career for you 14/06/2023

Part 7: Which jobs put you in the best long-term position? 14/06/2023

Part 6: Which jobs help people the most? 14/06/2023

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

Listen ""Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky"

Episode Synopsis

More episodes of the podcast TYPE III AUDIO (All episodes)

Bandwidth: Broadband or Narrowband?

Information Technology (IT)

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD