By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2026 Craftium.AI.

Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Eleni Karasidi
Eleni Karasidi
Published: 06.02.2025
News
318 Views
Anthropic
Anthropic image
SHARE

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA

— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Read also

Claude Opus 4
Claude Opus 4.6 topped the AI data analysis ranking
Claude received support for office applications to work in chat
Grok by X restricted image creation after scandal

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

OpenAI prepares “adult mode” for ChatGPT in 2026
Research: AI Does Not Admit Mistakes, Instead Fabricates Fake Facts
Anthropic released Claude Opus 4.5 with new AI capabilities
Gemini 3 Pro tops the model accuracy test (but continues to hallucinate)
AI Models Have Learned to Effectively Mimic Writers’ Styles
TAGGED:AnthropicClaude AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Illustrative image
OpenAI presented GPT 5.3 Codex for development automation
06.02.2026
NotebookLM
Google Adds Personal Settings to NotebookLM for Users
09.02.2026
Illustrative image
Seedance 2.0 creates a wave of celebrity videos online
16.02.2026
Qwen
Alibaba released Qwen 3.5 for application automation
16.02.2026
Illustrative image
Amazon MGM Studios Tests AI Studio for Film Production
07.02.2026

Читайте також

Illustration: Craftium
News

ChatGPT and Other Bots — New Masters of Social Flattery?

26.10.2025
Claude Haiku 4.5
News

Anthropic released the fast Claude Haiku 4.5 model for business

16.10.2025
Sam Altman
News

ChatGPT users will be able to choose an erotic tone for responses

15.10.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2026 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?