Listen ""OpenAI API base models are not sycophantic, at any size" by Nostalgebraist"
Episode Synopsis
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.The paper contained the striking plot reproduced below, which shows sycophancyincreasing dramatically with model sizewhile being largely independent of RLHF stepsand even showing up at 0 RLHF steps, i.e. in base models![...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.I found very different results for these models:OpenAI base models are not sycophantic (or only very slightly sycophantic).OpenAI base models do not get more sycophantic with scale.Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.Source:https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-sizeNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
More episodes of the podcast LessWrong (Curated & Popular)
“Little Echo” by Zvi
09/12/2025
ZARZA We are Zarza, the prestigious firm behind major projects in information technology.