By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2026 Craftium.AI.

Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Eleni Karasidi
Eleni Karasidi
Published: 06.02.2025
News
258 Views
Anthropic
Anthropic image
SHARE

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA

— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Read also

Grok
Grok by X restricted image creation after scandal
OpenAI prepares “adult mode” for ChatGPT in 2026
Research: AI Does Not Admit Mistakes, Instead Fabricates Fake Facts

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

Anthropic released Claude Opus 4.5 with new AI capabilities
Gemini 3 Pro tops the model accuracy test (but continues to hallucinate)
AI Models Have Learned to Effectively Mimic Writers’ Styles
ChatGPT and Other Bots — New Masters of Social Flattery?
Anthropic released the fast Claude Haiku 4.5 model for business
TAGGED:AnthropicClaude AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

AI Artist Illustration: Craftium
OpenAI updated GPT Image 1.5 for ChatGPT with new editing capabilities
17.12.2025
Illustrative image
OpenAI launches a global app directory for ChatGPT
18.12.2025
OpenAI
OpenAI enhances ChatGPT’s voice capabilities for expansion into new devices
02.01.2026
ChatGPT
ChatGPT received new flexible response personalization settings
21.12.2025
Qwen-Image-2512
Alibaba introduced the open model Qwen-Image 2512 for image generation
05.01.2026

Читайте також

Sam Altman
News

ChatGPT users will be able to choose an erotic tone for responses

15.10.2025
OpenAI
News

OpenAI Prepares New Features for Image Generation and API Security

06.10.2025
Claude Sonnet
News

Claude Sonnet 4.5 detects testing and enhances AI security

05.10.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2026 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?