By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

Limitations of Generative AI in Historical Questions Revealed by Study

The GPT-4, Llama, and Gemini models showed low accuracy on complex historical questions, especially regarding underrepresented regions

Alex Dubenko
Alex Dubenko
Published: 21.01.2025
News
Футуристичний ШІ у розпачі
Illustrative image (DALL-E 3)
SHARE

New research has uncovered weaknesses in generative AI when answering complex historical questions. A team of researchers tested the capabilities of three leading models — GPT-4 by OpenAI, Llama by Meta, and Gemini by Google — on historical questions using the new Hist-LLM benchmark. This benchmark is based on data from the global historical database Seshat. The results, presented at the NeurIPS conference, showed that even the best model — GPT-4 Turbo — achieved only 46% accuracy.

Read also

Scientists whisper to AI
Hidden AI Prompts Found in Scientific Preprints from Various Countries
Sam Altman: AI Approaches Idea Generation
New AI models o3 and o4-mini often make mistakes

Researchers from the Complexity Science Hub in Austria noted that AI models handle basic facts well, but lack the depth needed to solve more complex questions that require a detailed understanding of history. For example, GPT-4 Turbo incorrectly claimed that scale armor existed in Ancient Egypt, although it appeared there only 1,500 years later. Such mistakes may result from AI models relying more on well-known data than on less popular facts.

Additionally, the study found that the OpenAI and Llama models performed worse on questions related to certain regions, such as sub-Saharan Africa. This may indicate the presence of biases in the training data. Nevertheless, the researchers hope that such models could be useful for historians in the future, especially if the benchmark is improved by including data from underrepresented regions and making the questions more challenging.

AI Models Are Being Tested in Minecraft to Assess Their Capabilities
AI Tested in the Classic Game Super Mario Bros
Claude 3.7 Sonnet Successfully Plays Pokémon
A new AI test revealed unexpected model features
PikaLabs is accepting applications for early testing of Model version 2.1
TAGGED:ScienceTesting
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

Mistral AI
Mistral offers the open language model Voxtral for speech
17.07.2025
Grok 4
Grok 4 from xAI promises to become the new standard for developers
06.07.2025
ChatGPT Agent
OpenAI introduced ChatGPT Agent, allowing AI to delegate complex tasks
18.07.2025
Chatbots in messengers
Meta chatbots will gain the ability to independently send messages to users
04.07.2025
FLUX.1 context [dev]
FLUX.1 context [dev] simplifies image generation for all users
02.07.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe for AI news, tips, and guides to ignite creativity and enhance productivity.

By subscribing, you accept our Privacy Policy and Terms of Use.

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Subscribe
Level Up with AI!
Get inspired with impactful news, smart tips and creative guides delivered directly to your inbox.

By subscribing, you accept our Privacy Policy and Terms of Use.

Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?