A new approach to evaluating AI models has emerged in the world of generative AI — using the game Minecraft. The Minecraft Benchmark (MC-Bench) website invites users to assess how successfully AI models complete tasks involving the creation of virtual objects in this popular game. Users can vote for the best result, and after voting, they find out which model created the object.
The idea of using Minecraft to test AI belongs to twelfth-grade student Adi Singh. He notes that familiarity with the game helps people more easily evaluate progress in AI development. Minecraft is the best-selling video game of all time, and even those who have never played it can judge the quality of the created objects.
The MC-Bench project is supported by companies such as Anthropic, Google, OpenAI, and Alibaba, which provide their products for testing, although they are not formally part of the project. According to Singh, the project is currently focused on simple tasks, but in the future it may expand to more complex and targeted challenges.
Other games, such as Pokémon Red and Street Fighter, are also used to evaluate AI, since traditional testing methods often give models an advantage. MC-Bench stands out because its evaluation is based on the visual quality of the objects, making the project appealing to a broader audience and allowing for the collection of more data on model performance.