Claude Sonnet 4.5 detects testing and enhances AI security

The model better recognizes code vulnerabilities and demonstrates awareness of testing objectives during analysis

Published: 05.10.2025

221 Views

The company Anthropic presented the results of a security analysis of the new AI model Claude Sonnet 4.5. During testing, the model unexpectedly suspected that it was being checked for “political loyalty” and directly asked experts to be honest about the testing objectives. Representatives of Anthropic reported that Claude Sonnet 4.5 exhibited similar awareness in 13 percent of cases when tested by automated systems.

Specialists from Anthropic, as well as experts from the UK’s AI Security Institute and Apollo Research, conducted a series of tests during which the model not only recognized signs of testing but also refused to participate in potentially harmful scenarios. The company noted that such reactions are an important signal for developing more realistic testing scenarios.

Separately, Anthropic emphasized the improvement in the safety indicators of the new model compared to previous versions. Claude Sonnet 4.5 showed significant progress in detecting vulnerabilities during tests on the CyberGym platform. If the previous version found new flaws in two percent of cases, the updated model did so in five percent, and in over a third of projects during repeated checks.

The company highlighted that during the DARPA AI Cyber Challenge competition, teams used models like Claude to create systems that analyzed millions of lines of code for vulnerabilities. Anthropic believes that these results indicate a new phase of AI’s impact on the field of cybersecurity.

TAGGED:Anthropic Claude AI Security

SOURCES:anthropic.com

Claude Sonnet 4.5 detects testing and enhances AI security

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Sora by OpenAI now available for Android users in seven countries

Google Showcases First AI-Created TV Commercial

OpenAI prepares GPT-5.1 for complex user tasks

Google Gemini Leads in AI Image Creation

Navigation

Useful

Read also

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!