A study conducted by the BBC has uncovered serious accuracy issues with generative AI when creating news summaries. The ability of ChatGPT by OpenAI, Google Gemini, Microsoft Copilot, and Perplexity to accurately summarize news was tested. The results showed that more than half of the generated responses had “significant issues in one form or another.”
As part of the study, the BBC asked these models to provide summaries for one hundred articles published on their website. Journalists thoroughly examined the responses and found that nineteen percent contained incorrect statements, figures, or dates. Additionally, thirteen percent of the quotes were either altered or entirely missing from the corresponding articles.
One example was an error by Google Gemini, which incorrectly claimed that the UK’s National Health Service recommends not starting vaping. In reality, the NHS advises using vaping to quit smoking. Another example is ChatGPT, which in December 2024 mistakenly stated that Ismail Haniyeh is part of the Hamas leadership, even though he was killed back in July 2024.
These inaccuracies have raised concerns at the BBC, and the company has reached out to tech giants, urging them to address these issues. Deborah Turness, CEO of BBC News and Current Affairs, called for cooperation between the news industry, tech companies, and the government to avoid potential threats from inaccurate AI-generated headlines.