“Anthropic” Peeks into the AI’s Mind: How Models Really Think

Research has shown that Claude thinks on a conceptual level and adapts to prompts

Published: 02.04.2025

221 Views

Illustrative image

The company Anthropic published a study on the inner workings of the Claude 3.5 Haiku language model. The goal was to create a tool for studying the “biology of AI”—tracing the logic the model uses when responding to prompts. This is an attempt to answer questions that have so far remained open, such as: do models plan their answers in advance, and do the explanations they provide reflect the real thinking process?

The analysis revealed that Claude sometimes operates with a “universal language of thought” that is independent of any specific language. For example, the concepts of opposites (“small” — “large”) are activated in the same way in English, French, and Chinese, and only then translated into the language of the prompt. In the case of poetry, the model doesn’t just pick a word at the end of a line—it plans it before the second line even begins, selects possible rhymes, and constructs sentences around them.

Other experiments showed that Claude can “imitate” a logical chain, adapting its reasoning to the user’s hint—even if that hint is incorrect. For example, when a user gives a wrong clue in a complex math problem, the model constructs a fictitious argument to fit a preselected answer. In cases involving prompts that could lead to undesirable behavior (such as instructions for making bombs), Claude recognizes the manipulation even before responding, but continues the phrase due to grammatical sequence pressure—and only after finishing the sentence does it revert to refusal.

The team acknowledged that their methods currently cover only part of the processes and require significant human effort for analysis. But even such a limited study has revealed new patterns in model behavior and could potentially help verify their reliability. The company calls this one of the riskiest but also most promising directions for development.

In comments, the researchers admitted that some experimental results surprised them: “We wanted to prove that the model doesn’t plan ahead, but instead saw the opposite.”

TAGGED:Anthropic Claude AI Generative AI

“Anthropic” Peeks into the AI’s Mind: How Models Really Think

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Sora by OpenAI now available for Android users in seven countries

Google Showcases First AI-Created TV Commercial

OpenAI prepares GPT-5.1 for complex user tasks

Google Gemini Leads in AI Image Creation

Navigation

Useful

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!