By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Eleni Karasidi
Eleni Karasidi
Published: 06.02.2025
News
Anthropic
Anthropic image
SHARE

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA

— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Read also

Illustrative image
Claude Sonnet 4 received support for a million tokens in the API
Claude received a memory feature for saving user conversations
Claude Opus 4.1 enhances the accuracy and performance of the AI model

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

ChatGPT now reminds you that you have been working with AI for too long
OpenAI removes the ability to index open ChatGPT chats on Google
OpenAI urges caution when using the ChatGPT agent
Researchers Urge to Maintain Transparency of Thought in AI Models
AWS in partnership with Anthropic launches AI agents marketplace
TAGGED:AnthropicClaude AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Gemini 2.5 Deep Think
Google DeepMind launches Gemini 2.5 Deep Think for Ultra plan subscribers
02.08.2025
Mistral AI
Mistral offers the open language model Voxtral for speech
17.07.2025
Image from Google site
AlphaEarth Foundations creates accurate maps of the Earth in minutes
31.07.2025
ChatGPT Agent
OpenAI introduced ChatGPT Agent, allowing AI to delegate complex tasks
18.07.2025
Llama and AI
Ollama introduced a convenient app for running local AI models
05.08.2025

Читайте також

Spies in AI
News

OpenAI strengthens protection against competitor espionage

09.07.2025
VPNs for AI Tools
Collections

Best VPNs for Accessing AI Services — Free and Paid

04.07.2025
Siri
News

Apple Tests OpenAI and Anthropic Models for Next-Generation Siri

01.07.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?