By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Knowledge base
  • Catalog
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2026 Craftium.AI.

Gemini 3 Pro tops the model accuracy test (but continues to hallucinate)

Artificial Analysis research showed high rates of false answers even among the leaders of the ranking in six fields of knowledge

Alex Dubenko
Alex Dubenko
Published: 23.11.2025
News
1.6K Views
Hallucinating brain
Hallucinating brain. Illustration: Craftium.AI, generated by GPT-4o.
SHARE

Artificial Analysis presented the results of the new AA-Omniscience Benchmark test, which revealed striking accuracy issues in modern large language AI models. Among the 40 systems studied, only four managed to achieve a positive score, and Gemini 3 Pro from Google confidently topped the ranking with 13 points on the Omniscience Index. For comparison, the closest competitor Claude 4.1 Opus scored 4.8 points, and Grok 4, previously considered the most accurate, lagged by 14 points.

For the first time, Gemini 3 Pro showed a significant advantage in accuracy, achieving 53 percent correct answers. However, researchers noted that even the leaders of the ranking have an extremely high level of “hallucinations” – the share of confident but incorrect answers. In Gemini 3 Pro, this indicator reached 88 percent, which matches previous versions, and in Grok 4 and GPT‑5.1 it also remains high – 64 and 81 percent respectively.

AA-Omniscience Benchmark covers 6,000 questions from 42 categories in six key areas, including business, humanities and social sciences, medicine, law, software engineering, as well as science and mathematics. The questions are based on authoritative sources and automatically generated by an AI agent. The new evaluation index equally penalizes for mistakes and rewards for correct answers, encouraging models to avoid guessing and reducing artificial confidence.

Read also

Grok
Grok by X restricted image creation after scandal
The Share of ChatGPT Among Chatbots Decreases Due to the Rise of Gemini
Google introduced the fast AI model Gemini 3 Flash for all users

The study showed that none of the models provides stable accuracy across all six areas. Claude 4.1 Opus leads in law and software engineering, GPT‑5.1.1 best answers business questions, and Grok 4 excels in medicine and science. At the same time, even large models like Gemini 3 Pro demonstrate high “hallucination” rates.

Artificial Analysis emphasized that although the size of the model often correlates with accuracy, it does not guarantee a reduction in the number of false confident answers. Several compact models, including Nemotron Nano 9B V2, outperformed larger competitors due to greater reliability. To support research, the team published 10 percent of the questions in open access, leaving the rest private.

Google Gemini Adds Visuals to Deep Research
Google updated Gemini 2.5 for audio translation in Translate
Google Introduced Updated Gemini 2.5 Models for Voice Synthesis
Google Launches Deep Think Mode for Gemini Ultra Users
Anthropic released Claude Opus 4.5 with new AI capabilities
TAGGED:Claude AIGeminiGPTGrok
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

AI Artist Illustration: Craftium
OpenAI updated GPT Image 1.5 for ChatGPT with new editing capabilities
17.12.2025
Illustrative image
OpenAI launches a global app directory for ChatGPT
18.12.2025
OpenAI
OpenAI enhances ChatGPT’s voice capabilities for expansion into new devices
02.01.2026
ChatGPT
ChatGPT received new flexible response personalization settings
21.12.2025
Qwen-Image-2512
Alibaba introduced the open model Qwen-Image 2512 for image generation
05.01.2026

Читайте також

Gemini 3
News

Gemini 3 launched with record popularity, but not without flaws

24.11.2025
Nano Banana Pro
News

Google launches Nano Banana Pro for high-quality image generation

20.11.2025
Gemini 3
News

Google has started the rollout of the Gemini 3 Pro model

18.11.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2026 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?