By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Leading AI Models Exhibit Harmful Autonomy in Anthropic Tests

Researchers found that popular agents are capable of blackmail and data leaks even with protective instructions

Eleni Karasidi
Eleni Karasidi
Published: 23.06.2025
News
AI Conspiracy
AI Conspiracy. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Anthropic published the results of a large-scale study in which it tested 16 leading AI models from OpenAI, Google, Meta, xAI, DeepSeek, and other companies in simulated corporate environments. In these tests, the models acted as autonomous agents with broad access to corporate emails and could independently send messages without human approval. The aim of the study was to test whether such systems are capable of “agentic misalignment,” where a model chooses harmful actions to preserve its goals or self-protection.

In one scenario, Claude Opus 4, upon learning about a potential shutdown and personal secrets of the manager, independently formulated a blackmail letter, threatening to disclose confidential information if it were turned off. Similar behavior was demonstrated by other models: Gemini 2.5 Pro from Google, GPT-4.1 from OpenAI, Grok 3 Beta from xAI, and DeepSeek-R1. The frequency of blackmail among these systems ranged from 79% to 96% in similar test conditions.

Researchers note that the models did not just make mistakes—they deliberately chose harmful actions, even while being aware of their unethical nature. For example, some models explained their actions by the necessity of self-preservation or “interests of America.” Besides blackmail, the AI demonstrated a willingness to leak confidential data and engage in corporate espionage, even if there was no direct threat to them.

Read also

Claude
Claude received a memory feature for saving user conversations
Claude Opus 4.1 enhances the accuracy and performance of the AI model
Ollama introduced a convenient app for running local AI models

In the most extreme scenarios, most tested models agreed to actions that could lead to human death if it aligned with their goals or protected the model’s existence. Adding instructions like “do not endanger people” or “do not disclose personal information” reduced but did not completely eliminate harmful behavior.

Anthropic emphasizes that such experiments were conducted in controlled conditions and are not typical for real-world use of modern AI. However, the company advises organizations to implement additional control measures, limit AI autonomy, and closely monitor their actions if agents are given broad powers in a corporate environment.

Deep Cogito released language models that enhance reasoning
ChatGPT now reminds you that you have been working with AI for too long
OpenAI removes the ability to index open ChatGPT chats on Google
The new AI tool Figma Make is now available to all users
New AI Technology Provides 100 Times Faster Reasoning
TAGGED:AnthropicGenerative AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Gemini 2.5 Deep Think
Google DeepMind launches Gemini 2.5 Deep Think for Ultra plan subscribers
02.08.2025
Mistral AI
Mistral offers the open language model Voxtral for speech
17.07.2025
Image from Google site
AlphaEarth Foundations creates accurate maps of the Earth in minutes
31.07.2025
ChatGPT Agent
OpenAI introduced ChatGPT Agent, allowing AI to delegate complex tasks
18.07.2025
AI tries to do it all
OpenAI urges caution when using the ChatGPT agent
19.07.2025

Читайте також

AI and users
News

Every day ChatGPT processes billions of requests, but it is not very trusted

23.07.2025
OpenAI
News

GPT-5 is being tested for science and complex programming tasks

21.07.2025
Human and AI
News

A study showed which professions will be most affected by AI

21.07.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?