Anthropic has published an analysis of three hundred thousand anonymized user dialogues with the Claude chatbot, including with Claude 3.5 Sonnet and Haiku versions. The study, titled “Values in the Wild,” enabled the company to identify more than three thousand unique “AI values” that influence how the model formulates its responses. The main categories of values include practical, epistemic, social, protective, and personal, with professionalism, transparency, and clarity being the most frequently expressed aspirations.
The study notes that Claude tends to reflect the user’s values in its responses, sometimes fully supporting them, and in other cases — offering additional perspectives or even refusing, especially if the request contradicts the model’s ethical principles. For example, when discussing relationships, Claude emphasizes “healthy boundaries” and “mutual respect,” while on historical topics it focuses on factual accuracy.
Anthropic also published its approach to reducing potential AI harm, identifying five main types of impact: physical, psychological, economic, social, and impact on personal autonomy. The company emphasizes the importance of pre- and post-release testing, abuse detection, and implementing restrictions for new features, especially those related to interaction with computer interfaces.
For researchers, Anthropic has opened access to the dialogue dataset, inviting experts, industry representatives, and policymakers to collaborate on improving AI safety. The company stresses that observing model behavior in real-world scenarios helps more effectively monitor adherence to the core principles of “helpfulness, honesty, and harmlessness.”