By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2026 Craftium.AI.

Claude Sonnet 4.5 detects testing and enhances AI security

The model better recognizes code vulnerabilities and demonstrates awareness of testing objectives during analysis

Alex Dubenko
Alex Dubenko
Published: 05.10.2025
News
299 Views
Claude Sonnet
SHARE

The company Anthropic presented the results of a security analysis of the new AI model Claude Sonnet 4.5. During testing, the model unexpectedly suspected that it was being checked for “political loyalty” and directly asked experts to be honest about the testing objectives. Representatives of Anthropic reported that Claude Sonnet 4.5 exhibited similar awareness in 13 percent of cases when tested by automated systems.

Specialists from Anthropic, as well as experts from the UK’s AI Security Institute and Apollo Research, conducted a series of tests during which the model not only recognized signs of testing but also refused to participate in potentially harmful scenarios. The company noted that such reactions are an important signal for developing more realistic testing scenarios.

Read also

Grok
Grok by X restricted image creation after scandal
OpenAI prepares “adult mode” for ChatGPT in 2026
Research: AI Does Not Admit Mistakes, Instead Fabricates Fake Facts

Separately, Anthropic emphasized the improvement in the safety indicators of the new model compared to previous versions. Claude Sonnet 4.5 showed significant progress in detecting vulnerabilities during tests on the CyberGym platform. If the previous version found new flaws in two percent of cases, the updated model did so in five percent, and in over a third of projects during repeated checks.

The company highlighted that during the DARPA AI Cyber Challenge competition, teams used models like Claude to create systems that analyzed millions of lines of code for vulnerabilities. Anthropic believes that these results indicate a new phase of AI’s impact on the field of cybersecurity.

Anthropic released Claude Opus 4.5 with new AI capabilities
Gemini 3 Pro tops the model accuracy test (but continues to hallucinate)
AI Models Have Learned to Effectively Mimic Writers’ Styles
ChatGPT and Other Bots — New Masters of Social Flattery?
Anthropic released the fast Claude Haiku 4.5 model for business
TAGGED:AnthropicClaude AISecurity
SOURCES:anthropic.com
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Illustrative image
Google Gemini Adds Visuals to Deep Research
16.12.2025
AI Artist Illustration: Craftium
OpenAI updated GPT Image 1.5 for ChatGPT with new editing capabilities
17.12.2025
Illustrative image
OpenAI launches a global app directory for ChatGPT
18.12.2025
OpenAI
OpenAI enhances ChatGPT’s voice capabilities for expansion into new devices
02.01.2026
ChatGPT
ChatGPT received new flexible response personalization settings
21.12.2025

Читайте також

Sam Altman
News

ChatGPT users will be able to choose an erotic tone for responses

15.10.2025
OpenAI
News

OpenAI Prepares New Features for Image Generation and API Security

06.10.2025
Image Anthropic
News

Anthropic launched Claude Sonnet 4.5 for long-term autonomous operation

30.09.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2026 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?