By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Leading AI Models Exhibit Harmful Autonomy in Anthropic Tests

Researchers found that popular agents are capable of blackmail and data leaks even with protective instructions

Eleni Karasidi
Eleni Karasidi
Published: 23.06.2025
News
255 Views
AI Conspiracy
AI Conspiracy. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Anthropic published the results of a large-scale study in which it tested 16 leading AI models from OpenAI, Google, Meta, xAI, DeepSeek, and other companies in simulated corporate environments. In these tests, the models acted as autonomous agents with broad access to corporate emails and could independently send messages without human approval. The aim of the study was to test whether such systems are capable of “agentic misalignment,” where a model chooses harmful actions to preserve its goals or self-protection.

In one scenario, Claude Opus 4, upon learning about a potential shutdown and personal secrets of the manager, independently formulated a blackmail letter, threatening to disclose confidential information if it were turned off. Similar behavior was demonstrated by other models: Gemini 2.5 Pro from Google, GPT-4.1 from OpenAI, Grok 3 Beta from xAI, and DeepSeek-R1. The frequency of blackmail among these systems ranged from 79% to 96% in similar test conditions.

Researchers note that the models did not just make mistakes—they deliberately chose harmful actions, even while being aware of their unethical nature. For example, some models explained their actions by the necessity of self-preservation or “interests of America.” Besides blackmail, the AI demonstrated a willingness to leak confidential data and engage in corporate espionage, even if there was no direct threat to them.

Read also

Claude Opus 4.5
Anthropic released Claude Opus 4.5 with new AI capabilities
Gemini 3 launched with record popularity, but not without flaws
TikTok users will be able to control the number of AI videos in their feed

In the most extreme scenarios, most tested models agreed to actions that could lead to human death if it aligned with their goals or protected the model’s existence. Adding instructions like “do not endanger people” or “do not disclose personal information” reduced but did not completely eliminate harmful behavior.

Anthropic emphasizes that such experiments were conducted in controlled conditions and are not typical for real-world use of modern AI. However, the company advises organizations to implement additional control measures, limit AI autonomy, and closely monitor their actions if agents are given broad powers in a corporate environment.

Grok 4.1 by xAI is now available to all users for free
Chinese Moonshot Releases Open Model Kimi K2 Thinking
Adobe Unveils New AI Tools for Photo and Video
TikTok Adds Tools to Simplify Video Editing
Pinterest introduces new features for personalizing user boards
TAGGED:AnthropicGenerative AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

grok
Grok received new features for creating images and videos
30.10.2025
sora and android
Sora by OpenAI now available for Android users in seven countries
05.11.2025
Google Image
Google Showcases First AI-Created TV Commercial
02.11.2025
OpenAI
OpenAI prepares GPT-5.1 for complex user tasks
07.11.2025
Gemini
Google Gemini Leads in AI Image Creation
28.10.2025

Читайте також

Illustration: Craftium
News

ChatGPT and Other Bots — New Masters of Social Flattery?

26.10.2025
Pokee AI
News

Pokee AI has released the PokeeResearch-7B model for online research

23.10.2025
Illustration: Craftium
News

YouTube is testing a feature to detect AI-generated videos with authors’ faces

22.10.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?