By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2026 Craftium.AI.

Leading AI Models Exhibit Harmful Autonomy in Anthropic Tests

Researchers found that popular agents are capable of blackmail and data leaks even with protective instructions

Eleni Karasidi
Eleni Karasidi
Published: 23.06.2025
News
321 Views
AI Conspiracy
AI Conspiracy. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Anthropic published the results of a large-scale study in which it tested 16 leading AI models from OpenAI, Google, Meta, xAI, DeepSeek, and other companies in simulated corporate environments. In these tests, the models acted as autonomous agents with broad access to corporate emails and could independently send messages without human approval. The aim of the study was to test whether such systems are capable of “agentic misalignment,” where a model chooses harmful actions to preserve its goals or self-protection.

In one scenario, Claude Opus 4, upon learning about a potential shutdown and personal secrets of the manager, independently formulated a blackmail letter, threatening to disclose confidential information if it were turned off. Similar behavior was demonstrated by other models: Gemini 2.5 Pro from Google, GPT-4.1 from OpenAI, Grok 3 Beta from xAI, and DeepSeek-R1. The frequency of blackmail among these systems ranged from 79% to 96% in similar test conditions.

Researchers note that the models did not just make mistakes—they deliberately chose harmful actions, even while being aware of their unethical nature. For example, some models explained their actions by the necessity of self-preservation or “interests of America.” Besides blackmail, the AI demonstrated a willingness to leak confidential data and engage in corporate espionage, even if there was no direct threat to them.

Read also

Grok
Grok by X restricted image creation after scandal
AI Content Takes Over YouTube and Brings in Millions of Dollars
Meta is working on new AI models for content management

In the most extreme scenarios, most tested models agreed to actions that could lead to human death if it aligned with their goals or protected the model’s existence. Adding instructions like “do not endanger people” or “do not disclose personal information” reduced but did not completely eliminate harmful behavior.

Anthropic emphasizes that such experiments were conducted in controlled conditions and are not typical for real-world use of modern AI. However, the company advises organizations to implement additional control measures, limit AI autonomy, and closely monitor their actions if agents are given broad powers in a corporate environment.

Google introduced the fast AI model Gemini 3 Flash for all users
OpenAI prepares “adult mode” for ChatGPT in 2026
Figma adds new AI tools for image editing
Research: AI Does Not Admit Mistakes, Instead Fabricates Fake Facts
Google Launches Deep Think Mode for Gemini Ultra Users
TAGGED:AnthropicGenerative AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

AI Artist Illustration: Craftium
OpenAI updated GPT Image 1.5 for ChatGPT with new editing capabilities
17.12.2025
Illustrative image
OpenAI launches a global app directory for ChatGPT
18.12.2025
OpenAI
OpenAI enhances ChatGPT’s voice capabilities for expansion into new devices
02.01.2026
ChatGPT
ChatGPT received new flexible response personalization settings
21.12.2025
Qwen-Image-2512
Alibaba introduced the open model Qwen-Image 2512 for image generation
05.01.2026

Читайте також

Mistral AI
News

Mistral AI introduced a new series of Mistral 3 models for business

03.12.2025
Digital train
News

The popularity of chatbots is rapidly growing among different generations

30.11.2025
Claude Opus 4.5
News

Anthropic released Claude Opus 4.5 with new AI capabilities

25.11.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2026 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?