Limitations of Generative AI in Historical Questions Revealed by Study

The GPT-4, Llama, and Gemini models showed low accuracy on complex historical questions, especially regarding underrepresented regions

Alex Dubenko

Published: 21.01.2025

News

184 Views

Illustrative image (DALL-E 3)

New research has uncovered weaknesses in generative AI when answering complex historical questions. A team of researchers tested the capabilities of three leading models — GPT-4 by OpenAI, Llama by Meta, and Gemini by Google — on historical questions using the new Hist-LLM benchmark. This benchmark is based on data from the global historical database Seshat. The results, presented at the NeurIPS conference, showed that even the best model — GPT-4 Turbo — achieved only 46% accuracy.

Researchers from the Complexity Science Hub in Austria noted that AI models handle basic facts well, but lack the depth needed to solve more complex questions that require a detailed understanding of history. For example, GPT-4 Turbo incorrectly claimed that scale armor existed in Ancient Egypt, although it appeared there only 1,500 years later. Such mistakes may result from AI models relying more on well-known data than on less popular facts.

Additionally, the study found that the OpenAI and Llama models performed worse on questions related to certain regions, such as sub-Saharan Africa. This may indicate the presence of biases in the training data. Nevertheless, the researchers hope that such models could be useful for historians in the future, especially if the benchmark is improved by including data from underrepresented regions and making the questions more challenging.

TAGGED:Science Testing

Limitations of Generative AI in Historical Questions Revealed by Study

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Sora by OpenAI now available for Android users in seven countries

Google Showcases First AI-Created TV Commercial

OpenAI prepares GPT-5.1 for complex user tasks

Google Gemini Leads in AI Image Creation

Navigation

Useful

Read also