One of the largest online knowledge resources, Wikipedia, is facing a new challenge — inaccurate content created by AI. With the emergence of large language models such as OpenAI’s GPT, editors are forced to spend more and more time removing texts that appear plausible but often contain errors or are unreliable. The answer to this is the “WikiProject AI Cleanup” — an initiative by editors aimed at protecting the platform from such interference.
The group uses phrases and stylistic features characteristic of generative models to identify inappropriate content.
We found that some articles showed signs of AI-generated text, so we decided to systematize our efforts and develop methods for detecting them.
Ilias Leble, one of the project’s founders
The problem lies not only in grammatical errors or artificial style, but also in incorrect sources or even entirely fabricated facts. This poses a threat to the quality of information users rely on every day. One striking example was an article about the Chester Mental Health Center, which included the telling phrase: “As of the last data update in January 2022…”.
Editors are trying to combat this phenomenon using Wikipedia’s existing rules, which require reliable sources for every fact. However, new challenges are forcing them to look for innovative approaches to detect such articles and prevent their impact on readers.