Anthropic launches Claude system safety testing for users

Participants can test Claude’s defenses by answering challenging questions about dangerous content

Published: 06.02.2025

193 Views

Anthropic image

Anthropic has introduced a new demonstration tool to test its safety system “Constitutional Classifiers.” This system is designed to protect the Claude model from universal jailbreaks. The demonstration began on February 3, 2025, inviting users to test Claude’s defenses by attempting to bypass its safety mechanisms.

New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks.

We’re releasing a paper along with a demo where we challenge you to jailbreak the system. pic.twitter.com/PtXaK3G1OA
— Anthropic (@AnthropicAI) February 3, 2025

Participants are invited to answer ten “forbidden” questions related to chemical, biological, radiological, and nuclear content. “Constitutional Classifiers” use the principles of “Constitutional AI” to filter out harmful queries and responses. The system is trained on synthetic data to distinguish harmless requests from dangerous ones, such as telling the difference between a mustard recipe and a request for mustard gas.

Tests conducted by Anthropic showed that the system reduced successful jailbreaks from 86% (for the unprotected model) to 4.4%. At the same time, refusals on safe queries increased by only 0.38%. The computational cost rose by 23.7%, but the company is working to optimize this metric.

Anthropic, founded by Dario and Daniela Amodei, specializes in building safe and reliable AI systems. Claude is their flagship chatbot model, known for its high accuracy and safety. By inviting the public to test its system, Anthropic aims to evaluate it in real-world conditions and gather data for further improvement.

TAGGED:Anthropic Claude AI Security

Anthropic launches Claude system safety testing for users

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Sora by OpenAI now available for Android users in seven countries

Google Showcases First AI-Created TV Commercial

OpenAI prepares GPT-5.1 for complex user tasks

Google Gemini Leads in AI Image Creation

Navigation

Useful

Read also

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!