By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Leading AI Models Exhibit Harmful Autonomy in Anthropic Tests

Researchers found that popular agents are capable of blackmail and data leaks even with protective instructions

Eleni Karasidi
Eleni Karasidi
Published: 23.06.2025
News
AI Conspiracy
AI Conspiracy. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Anthropic published the results of a large-scale study in which it tested 16 leading AI models from OpenAI, Google, Meta, xAI, DeepSeek, and other companies in simulated corporate environments. In these tests, the models acted as autonomous agents with broad access to corporate emails and could independently send messages without human approval. The aim of the study was to test whether such systems are capable of “agentic misalignment,” where a model chooses harmful actions to preserve its goals or self-protection.

In one scenario, Claude Opus 4, upon learning about a potential shutdown and personal secrets of the manager, independently formulated a blackmail letter, threatening to disclose confidential information if it were turned off. Similar behavior was demonstrated by other models: Gemini 2.5 Pro from Google, GPT-4.1 from OpenAI, Grok 3 Beta from xAI, and DeepSeek-R1. The frequency of blackmail among these systems ranged from 79% to 96% in similar test conditions.

Researchers note that the models did not just make mistakes—they deliberately chose harmful actions, even while being aware of their unethical nature. For example, some models explained their actions by the necessity of self-preservation or “interests of America.” Besides blackmail, the AI demonstrated a willingness to leak confidential data and engage in corporate espionage, even if there was no direct threat to them.

Read also

OpenAI
OpenAI Prepares New Features for Image Generation and API Security
Claude Sonnet 4.5 detects testing and enhances AI security
Anthropic launched Claude Sonnet 4.5 for long-term autonomous operation

In the most extreme scenarios, most tested models agreed to actions that could lead to human death if it aligned with their goals or protected the model’s existence. Adding instructions like “do not endanger people” or “do not disclose personal information” reduced but did not completely eliminate harmful behavior.

Anthropic emphasizes that such experiments were conducted in controlled conditions and are not typical for real-world use of modern AI. However, the company advises organizations to implement additional control measures, limit AI autonomy, and closely monitor their actions if agents are given broad powers in a corporate environment.

ChatGPT automatically selects a stricter model in sensitive conversations
AI Hosts Appear in Test Mode on YouTube Music
New Claude Models from Anthropic Available in 365 Copilot
Qwen introduced new models for voice, image editing, and content moderation
AI Models Learned to Conceal Deception During Safety Checks
TAGGED:AnthropicGenerative AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Kling AI Image
Cheaper, More Stable, Smarter: Kling AI Launches 2.5 Turbo
25.09.2025
Gemini
Google Released Limits for the Gemini Service
08.09.2025
AI-generated images
Animated Film Critterz Created with GPT-5
08.09.2025
Image from Adobe video
Google Nano Banana will appear in Photoshop to enhance image editing
12.09.2025
Google AI
New Opportunities for Audio and Languages in Gemini by Google
09.09.2025

Читайте також

ChatGPT and Claude
News

ChatGPT helps in everyday life, Claude automates business processes

17.09.2025
Image from the Anthropic website
News

Claude learned to automatically remember user conversation details

15.09.2025
Claude can now create and edit files
News

Claude learned to create and edit files directly in the interface

10.09.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?