By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Leading AI Models Exhibit Harmful Autonomy in Anthropic Tests

Researchers found that popular agents are capable of blackmail and data leaks even with protective instructions

Eleni Karasidi
Eleni Karasidi
Published: 23.06.2025
News
AI Conspiracy
AI Conspiracy. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Anthropic published the results of a large-scale study in which it tested 16 leading AI models from OpenAI, Google, Meta, xAI, DeepSeek, and other companies in simulated corporate environments. In these tests, the models acted as autonomous agents with broad access to corporate emails and could independently send messages without human approval. The aim of the study was to test whether such systems are capable of “agentic misalignment,” where a model chooses harmful actions to preserve its goals or self-protection.

In one scenario, Claude Opus 4, upon learning about a potential shutdown and personal secrets of the manager, independently formulated a blackmail letter, threatening to disclose confidential information if it were turned off. Similar behavior was demonstrated by other models: Gemini 2.5 Pro from Google, GPT-4.1 from OpenAI, Grok 3 Beta from xAI, and DeepSeek-R1. The frequency of blackmail among these systems ranged from 79% to 96% in similar test conditions.

Researchers note that the models did not just make mistakes—they deliberately chose harmful actions, even while being aware of their unethical nature. For example, some models explained their actions by the necessity of self-preservation or “interests of America.” Besides blackmail, the AI demonstrated a willingness to leak confidential data and engage in corporate espionage, even if there was no direct threat to them.

Read also

The escape of fanfics
A collection of fanfics was used to train AI without the authors’ consent
AI is taught spatial thinking through Snake and Tetris games
MiniMax-M1 processes a million tokens and approaches the level of Gemini 2.5 Pro

In the most extreme scenarios, most tested models agreed to actions that could lead to human death if it aligned with their goals or protected the model’s existence. Adding instructions like “do not endanger people” or “do not disclose personal information” reduced but did not completely eliminate harmful behavior.

Anthropic emphasizes that such experiments were conducted in controlled conditions and are not typical for real-world use of modern AI. However, the company advises organizations to implement additional control measures, limit AI autonomy, and closely monitor their actions if agents are given broad powers in a corporate environment.

Mistral AI introduced an improved open model Small 3.2
ChatGPT Boosts Creativity but Reduces Diversity of Ideas in Groups
AI WhatsApp provided a user with a private number instead of customer support
AI Models Get Confused While Playing Pokémon
Generative AI Helps Restore Renaissance Masterpieces
TAGGED:AnthropicGenerative AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

digital folder
New Features in ChatGPT’s Projects Expand Tool Capabilities
14.06.2025
AI writing effect
Using AI for Writing Reduces Students’ Brain Activity
18.06.2025
windsurf
Windsurf Faces Sudden Access Restrictions to Claude
04.06.2025
Genspark logo
Genspark introduces AI agent directly into the browser for Mac
12.06.2025
ChatGPT canvas
ChatGPT Adds New File Formats for Export from Canvas
16.06.2025

Читайте також

resisting robot
News

Research Reveals GPT-4o’s Reluctance to Shut Down

13.06.2025
AI cyclone prediction
News

A New Approach to Cyclone Forecasting Tested at Google’s Weather Lab

13.06.2025
apple
News

Major Apple AI Innovations

12.06.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?