"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

03/09/2023 4 min
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

Listen ""OpenAI API base models are not sycophantic, at any size" by Nostalgebraist"

Episode Synopsis

In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.The paper contained the striking plot reproduced below, which shows sycophancyincreasing dramatically with model sizewhile being largely independent of RLHF stepsand even showing up at 0 RLHF steps, i.e.  in base models![...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.I found very different results for these models:OpenAI base models are not sycophantic (or only very slightly sycophantic).OpenAI base models do not get more sycophantic with scale.Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.Source:https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-sizeNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

More episodes of the podcast LessWrong (Curated & Popular)