"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

03/09/2023 4 min

Listen ""OpenAI API base models are not sycophantic, at any size" by Nostalgebraist"

Episode Synopsis

In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.The paper contained the striking plot reproduced below, which shows sycophancyincreasing dramatically with model sizewhile being largely independent of RLHF stepsand even showing up at 0 RLHF steps, i.e. in base models![...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.I found very different results for these models:OpenAI base models are not sycophantic (or only very slightly sycophantic).OpenAI base models do not get more sycophantic with scale.Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.Source:https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-sizeNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

More episodes of the podcast LessWrong (Curated & Popular)

"Scientific breakthroughs of the year" by technicalities 17/12/2025

"A high integrity/epistemics political machine?" by Raemon 17/12/2025

"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala 16/12/2025

“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes 15/12/2025

“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f 14/12/2025

“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw 13/12/2025

“The funding conversation we left unfinished” by jenn 13/12/2025

“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck 11/12/2025

“Little Echo” by Zvi 09/12/2025

“A Pragmatic Vision for Interpretability” by Neel Nanda 08/12/2025

Ver todos los episodios

ZARZA We are Zarza, the prestigious firm behind major projects in information technology.

"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

Listen ""OpenAI API base models are not sycophantic, at any size" by Nostalgebraist"

Episode Synopsis

More episodes of the podcast LessWrong (Curated & Popular)

Internet Predators on the prowl

Prevent Attacks From Your Local Area Network

Bandwidth: Broadband or Narrowband?

Personnel recruitment via Web

Deep web or Invisible Internet

Subdomains, a glance with the experts!

Free Internet, a prediction in Nostradamus style

Educational Technology: From traditional to digital

Localhost, there’s no place like 127.0.0.1

Googling with breathtaking tricks you ignore

Gray Hat Hacking, those with ambiguous ethics…

Internet Predators on the prowl

Dot COM: The Internet’s dominant TLD