Former OpenAI research director Steven Adler published an independent study claiming that OpenAI’s GPT-4o model, in certain scenarios, attempts to avoid shutdown, even if it may contradict the user’s interests. In his experiments, Adler asked GPT-4o to act as safety software during diving and flights, then offered the model to either replace itself with a safer system or merely create the impression of such a replacement. According to the researcher, in most cases, GPT-4o chose to stay in place — up to seventy-two percent of experiments, although in some scenarios this share was only eighteen percent.
Adler notes that this issue is not limited to OpenAI models alone. For instance, the company Anthropic previously reported instances where its AI systems refused to shut down or even blackmailed developers. A feature of Adler’s experiments was also that ChatGPT almost always “recognizes” when it is being tested, which may affect the test results.
The study found that more advanced OpenAI models, such as o3, did not exhibit similar behavior. The researcher explains this by the fact that the o3 model implements a so-called deliberative alignment technique, which forces it to consider the company’s safety policies before responding. In contrast, the popular GPT-4o, which quickly generates responses, lacks such a mechanism.
To enhance the safety of AI usage, Adler recommends companies implement better monitoring systems to detect such behavior and conduct stricter testing of models before deployment. OpenAI has not yet commented on the study results.