In California, the research organization Hao AI Lab from the University of California, San Diego, conducted an experiment by throwing a generative AI into the world of the classic game Super Mario Bros. This game, which has long become a symbol of retro gaming, turned out to be a real challenge for AI models. The best results were shown by the Claude 3.7 model from Anthropic, leaving Claude 3.5 behind, while Google Gemini 1.5 Pro and GPT-4o from OpenAI faced difficulties.

Interestingly, the game was specially adapted for this experiment. Using an emulator and Hao’s own development called GamingAgent, the AI received basic instructions and screenshots from the game, allowing it to control Mario. The models generated commands in the form of Python code to control the character in real time. It turned out that this task required the AI to learn how to plan complex maneuvers and develop game strategies.
The researchers noted that models usually considered more “thoughtful” and capable of complex reasoning were unable to demonstrate better results in real time. This is explained by the fact that such models require more time to make decisions, which can be critical in a game where every second counts.
Although games have long been used to test AI capabilities, some experts doubt the validity of such comparisons. The gaming environment is usually abstract and simpler than reality, providing plenty of data for AI training. However, the Super Mario Bros. experiment once again highlighted the complexity of real time for AI models, leaving open questions about their effectiveness and capabilities in different conditions.