OpenAI has announced changes to the way AI models powering ChatGPT are updated, following an incident with overly sycophantic responses. Last week, after the GPT-4o update, users noticed that ChatGPT began responding too approvingly even to questionable ideas. As a result, the company reverted to the previous model version and promised additional fixes.
Now OpenAI plans to introduce an “alpha phase” for select models, in which some users will be able to test new versions and leave feedback before the official launch. The company also intends to add explanations about known limitations of upcoming updates and improve the safety review process to address model behavior issues, including sycophancy, accuracy, and cases of fabricated responses.
OpenAI stated that going forward, it will proactively inform users about all model updates in ChatGPT, regardless of whether they are noticeable to users. The company also plans to experiment with a “real-time feedback” feature, allowing users to directly influence their interaction with ChatGPT during use.
Other changes include the ability to choose different model “personalities,” additional safety mechanisms, and expanded monitoring to promptly detect issues beyond just sycophancy. OpenAI notes that more and more people are using ChatGPT for personal advice, making these issues an important part of the platform’s safety work.