By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Eleni Karasidi
Eleni Karasidi
Published: 06.02.2025
News
Anthropic
Anthropic image
SHARE

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA

— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Read also

resisting robot
Research Reveals GPT-4o’s Reluctance to Shut Down
Reddit Challenges Anthropic’s Actions Over Content Access
Windsurf Faces Sudden Access Restrictions to Claude

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

Claude will feature an artifacts gallery for collaborative creativity
Voice Mode in Claude Expands User Communication Capabilities
Which Chatbots Collect the Most Personal User Data
OpenAI introduced an updated model for the Operator service
AI Fabricates Facts Less Often Than Humans: Opinion of Anthropic CEO
TAGGED:AnthropicClaude AISecurity
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

SB1
Soundboard SB1 by ElevenLabs — Creating Music and Effects on the Fly
18.05.2025
Codex
New Codex Agent from OpenAI Expands ChatGPT Capabilities
16.05.2025
Google Beam
3D Video Meetings Become Reality with Google Beam
21.05.2025
Flow
Creating Videos in Minutes: Google Launches Flow
21.05.2025
AI jungle explorers
OpenAI launches a search for the lost cities of the Amazon with prizes up to $250,000
20.05.2025

Читайте також

Claude 4
News

Claude Opus 4 outpaces competitors but surprises with behavior

22.05.2025
AI jailbreak attack
News

AI-Based Chatbots Are Easily Tricked by Bypassing Their Security Systems

21.05.2025
confused AI doctor
News

AI chats won’t replace doctors: research results

11.05.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe to our weekly digest of news, guides, and reviews about AI. Get fresh content delivered straight to your inbox!

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?