By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Eleni Karasidi
Eleni Karasidi
Published: 06.02.2025
News
Anthropic
Anthropic image
SHARE

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA

— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Read also

OpenAI
OpenAI Prepares New Features for Image Generation and API Security
Claude Sonnet 4.5 detects testing and enhances AI security
Anthropic launched Claude Sonnet 4.5 for long-term autonomous operation

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

ChatGPT automatically selects a stricter model in sensitive conversations
New Claude Models from Anthropic Available in 365 Copilot
Qwen introduced new models for voice, image editing, and content moderation
AI Models Learned to Conceal Deception During Safety Checks
ChatGPT helps in everyday life, Claude automates business processes
TAGGED:AnthropicClaude AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Kling AI Image
Cheaper, More Stable, Smarter: Kling AI Launches 2.5 Turbo
25.09.2025
Image from Adobe video
Google Nano Banana will appear in Photoshop to enhance image editing
12.09.2025
Image example
The use of Nano Banana in Gemini grows thanks to mini-figurines (+prompt)
16.09.2025
AI tries on masks
ChatGPT received new personalization options for users
18.09.2025
Suno v5
Suno v5: even more natural sound and full track control
26.09.2025

Читайте також

Image from the Anthropic website
News

Claude learned to automatically remember user conversation details

15.09.2025
AI spreads false information
News

AI Chatbots Are Twice as Likely to Spread Fake News

15.09.2025
Claude can now create and edit files
News

Claude learned to create and edit files directly in the interface

10.09.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?