On the final day of its 12-day event, OpenAI introduced a new AI model for reasoning tasks — o3, the successor to the o1 model. Alongside it, a compact version — o3-mini — was presented, designed for specific tasks. This release promises a significant breakthrough in the ability to model cognitive processes.
o3, our latest reasoning model, is a breakthrough, with a step function improvement on our hardest benchmarks. we are starting safety testing & red teaming now. https://t.co/4XlK1iHxFK
— Greg Brockman (@gdb) December 20, 2024
OpenAI states that o3, under certain conditions, approaches AGI — a system capable of performing most economically significant tasks typically done by humans. Although the company emphasizes that this is not yet a definitive breakthrough, o3’s test results significantly surpass previous OpenAI models. In the ARC-AGI test, which evaluates an AI’s ability to acquire new skills beyond its training data, o3 scored 87.5% in high-compute mode, tripling o1’s performance in the lowest mode.
The model achieved outstanding results in various tests: 96.7% on the 2024 American Mathematics Exam, 87.7% in GPQA Diamond, answering graduate-level questions in biology, physics, and chemistry, and set a new record of 25.2% in the Frontier Math test by EpochAI. Despite these achievements, experts such as ARC-AGI co-author François Chollet caution against overestimating these results, pointing to o3’s struggles with simple tasks and the high costs of using its advanced modes.
Another significant improvement in o3 is the ability to adjust computation time, allowing users to choose low, medium, or high modes depending on task complexity. The model uses a “private chain of thought” process, enabling it to internally analyze tasks, explain its reasoning, and provide more reliable results in fields such as physics, mathematics, and programming.
OpenAI acknowledges potential risks associated with o3, given issues found in the previous model. OpenAI teams are now applying a “discriminative alignment” technique to ensure o3’s compliance with safety principles. To minimize risks, OpenAI will first make o3-mini available for testing by safety researchers, while o3 will become available later in 2025. CEO Sam Altman also advocates for the creation of a federal testing system to assess the potential impact of such models.