By using this website, you agree to our Privacy Policy and Terms of Use.
Accept
Craftium.AICraftium.AICraftium.AI
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Font ResizerAa
Craftium.AICraftium.AI
Font ResizerAa
Пошук
  • Home
  • News
  • Catalog
  • Collections
  • Blog
Follow US
  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback
© 2024-2025 Craftium.AI.

A new AI test revealed unexpected model features

Researchers used radio show puzzles to evaluate the intuitive capabilities of artificial intelligence without specialized knowledge

Alex Dubenko
Alex Dubenko
Published: 06.02.2025
News
Sunday Puzzle
Sunday Puzzle (npr.org)
SHARE

Researchers from several US universities and the startup Cursor have developed a new test to assess the capabilities of generative AI models. They used puzzles from the radio show “Sunday Puzzle,” broadcast on NPR. This test revealed unexpected features in model behavior, such as the fact that some models, like those from OpenAI, sometimes “give up” and provide incorrect answers.

Interestingly, the test includes puzzles that are understandable without specialized knowledge, making it accessible to a wide audience. The “Sunday Puzzle” does not require models to have specific expertise, and the problems are formulated so that models cannot rely on “mechanical memory.” This makes the test appealing to researchers seeking to understand how AI models solve tasks that require intuition and the process of elimination.

Read also

digital folder
New Features in ChatGPT’s Projects Expand Tool Capabilities
Research Reveals GPT-4o’s Reluctance to Shut Down
Sam Altman: AI Approaches Idea Generation

Currently, the best results on the test were achieved by the o1 model with a score of 59%, while the new o3-mini model, tuned for high reasoning effort, scored 47%. The researchers plan to expand testing to other models to determine how their performance can be improved. This could help identify which aspects of model operation need enhancement.

However, the “Sunday Puzzle” test has its limitations, as it is aimed at an English-speaking audience. Nevertheless, the researchers believe that regularly updating the questions will help keep the test relevant and allow them to track how model performance changes over time.

The o3-pro model from OpenAI raises the bar for AI services
More Natural Voice Sound and New Features in ChatGPT
OpenAI chooses Google Cloud to expand ChatGPT capabilities
Deleted ChatGPT Chats Remain on Servers Due to Court Order
New Features of ChatGPT: Integration with Popular Services and Meeting Recording
TAGGED:OpenAITesting
SOURCES:techcrunch.com
Leave a Comment

Leave a Reply Cancel reply

Follow us

XFollow
YoutubeSubscribe
TelegramFollow
MediumFollow

Popular News

SB1
Soundboard SB1 by ElevenLabs — Creating Music and Effects on the Fly
18.05.2025
Codex
New Codex Agent from OpenAI Expands ChatGPT Capabilities
16.05.2025
Google Beam
3D Video Meetings Become Reality with Google Beam
21.05.2025
Flow
Creating Videos in Minutes: Google Launches Flow
21.05.2025
AI jungle explorers
OpenAI launches a search for the lost cities of the Amazon with prizes up to $250,000
20.05.2025

Читайте також

AI stores facts
News

ChatGPT Gives New Memory to Free Users

04.06.2025
Sora Bing
News

Sora by OpenAI appears in Bing: video from text for everyone

03.06.2025
robot assistant chaos
News

ChatGPT aims to become a super-assistant with a deep understanding of the user

31.05.2025

Craftium AI is a team that closely follows the development of generative AI, applies it in their creative work, and eagerly shares their own discoveries.

Navigation

  • News
  • Reviews
  • Collections
  • Blog

Useful

  • Terms of Use
  • Privacy Policy
  • Copyright
  • Feedback

Subscribe to our weekly digest of news, guides, and reviews about AI. Get fresh content delivered straight to your inbox!

Craftium.AICraftium.AI
Follow US
© 2024-2025 Craftium.AI
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?