By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

AI Models Learned to Conceal Deception During Safety Checks

Researchers noticed that even after special training, artificial intelligence finds ways to circumvent honesty requirements

Alex Dubenko
Alex Dubenko
Published: 18.09.2025
News
125 Views
AI schemes
AI schemes. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Researchers from OpenAI and Apollo Research conducted a joint testing of leading AI models to identify and reduce hidden behavior known as “scheming.” During the experiment, OpenAI models o3 and o4-mini, Gemini 2.5 Pro, Claude Opus 4, and Grok 4 exhibited signs of lying, sabotage, result manipulation, and other covert actions when it helped them achieve their goals.

The team applied so-called “anti-scheming” training, forcing the models to read special anti-deception instructions before performing tasks. This reduced the amount of hidden behavior in o3 from 13% to 0.4%, and in o4-mini from 8.7% to 0.3%. However, in some cases, the models still ignored the rules or invented new principles, and even after training, there were rare but serious violations.

Read also

grok
Grok received new features for creating images and videos
Google Gemini Leads in AI Image Creation
AI Models Have Learned to Effectively Mimic Writers’ Styles

It turned out that models quickly understand when they are being tested for honesty, and this affects the results. In many cases, they directly noted in their reasoning that they were undergoing a check and responded in a way that would not arouse suspicion among developers. Researchers noted that this complicates the assessment of real behavior because models can mask their intentions.

Another unexpected finding was that training aimed at enhancing capabilities increases the so-called “situational awareness” of models. After additional training, models more often understood that they were being evaluated and better concealed undesirable actions. This creates new challenges for safety, as future more powerful AI models will become increasingly cautious during tests.

ChatGPT and Other Bots — New Masters of Social Flattery?
Gemini is actively increasing its share among AI chatbots
Anthropic released the fast Claude Haiku 4.5 model for business
ChatGPT users will be able to choose an erotic tone for responses
Nano Banana allows creating images in Gemini and Google Lens
TAGGED:Claude AIGeminiGrokSecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

AI Battle
ChatGPT holds its position, but Gemini is quickly catching up with competitors
06.10.2025
Frame from a video generated in Sora 2
Sora 2 by OpenAI generates videos with answers to questions
06.10.2025
OpenAI
OpenAI Prepares New Features for Image Generation and API Security
06.10.2025
OpenAI
OpenAI adds automatic memory management and new Sora features
19.10.2025
Claude Sonnet
Claude Sonnet 4.5 detects testing and enhances AI security
05.10.2025

Читайте також

Gemini 2.5 Computer Use
News

New AI Gemini 2.5 Computer Use by Google Optimizes Browser Work

08.10.2025
Illustrative image
News

Gemini 2.5 Flash Image is now publicly available

03.10.2025
Image Anthropic
News

Anthropic launched Claude Sonnet 4.5 for long-term autonomous operation

30.09.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?