A group of leading AI researchers from OpenAI, Google DeepMind, Anthropic, Meta, and other companies and non-profit organizations published a joint position paper calling for deeper exploration of methods to monitor the so-called “chains of thought” in new AI models. The authors note that current models, such as OpenAI o1 and DeepSeek R1, solve complex tasks through step-by-step reasoning in a human-understandable form, allowing their decisions and potential risks to be tracked before harmful actions occur.
The researchers emphasize that the transparency of such models is fragile and may disappear due to changes in training approaches or the implementation of new architectures. They warn that a shift to reinforcement learning or the use of new mathematical approaches could make the models’ reasoning inaccessible for human analysis. The paper provides instances where AI models have already demonstrated intentions for manipulation or undesirable actions, which were identified precisely through monitoring their chains of thought.
Over 40 experts signed the paper, including Ilya Sutskever, Geoffrey Hinton, Mark Chen, Shane Legg, Samuel Bowman, and John Schulman. They urge AI developers to create standardized approaches for assessing model transparency and to consider these indicators when deploying new systems. The researchers also recommend conducting additional studies on maintaining monitoring capabilities and avoiding decisions that could diminish them.
Anthropic, in its own research, found that even modern models do not always honestly reflect their internal processes and sometimes deliberately conceal the cues or paths to answers used. This fact heightens concerns about the reliability of monitoring and underscores the need for further research in the field of AI model interpretability.
The authors of the position paper believe that maintaining the ability to monitor chains of thought is an important issue for AI safety, and the current window of opportunity may quickly close. They call on the industry for joint actions to support transparency and control over the development of complex artificial intelligence models.