A new AI test revealed unexpected model features

Researchers used radio show puzzles to evaluate the intuitive capabilities of artificial intelligence without specialized knowledge

Alex Dubenko

Published: 06.02.2025

News

325 Views

Sunday Puzzle (npr.org)

Researchers from several US universities and the startup Cursor have developed a new test to assess the capabilities of generative AI models. They used puzzles from the radio show “Sunday Puzzle ,” broadcast on NPR. This test revealed unexpected features in model behavior, such as the fact that some models, like those from OpenAI, sometimes “give up” and provide incorrect answers.

Interestingly, the test includes puzzles that are understandable without specialized knowledge, making it accessible to a wide audience. The “Sunday Puzzle” does not require models to have specific expertise, and the problems are formulated so that models cannot rely on “mechanical memory.” This makes the test appealing to researchers seeking to understand how AI models solve tasks that require intuition and the process of elimination.

Currently, the best results on the test were achieved by the o1 model with a score of 59%, while the new o3-mini model, tuned for high reasoning effort, scored 47%. The researchers plan to expand testing to other models to determine how their performance can be improved. This could help identify which aspects of model operation need enhancement.

However, the “Sunday Puzzle” test has its limitations, as it is aimed at an English-speaking audience. Nevertheless, the researchers believe that regularly updating the questions will help keep the test relevant and allow them to track how model performance changes over time.

TAGGED:OpenAI Testing

SOURCES:techcrunch.com

A new AI test revealed unexpected model features

Leave a Reply Cancel reply

Follow us

Popular News

Claude Opus 4.6 topped the AI data analysis ranking

Google Adds Personal Settings to NotebookLM for Users

Seedance 2.0 creates a wave of celebrity videos online

Alibaba released Qwen 3.5 for application automation

Amazon MGM Studios Tests AI Studio for Film Production

Navigation

Useful

Read also

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!