OpenAI says a non-public general-purpose reasoning model has racked up 35 out of 42 points on this year’s International Mathematical Olympiad (IMO) competition. Enough to place in the top medal tier alongside some of the world’s best mathematicians.
Why the IMO matters
The IMO is the Olympics of school-age mathematics: two four-and-a-half-hour sessions, six notoriously inventive proofs, and no calculators, internet, or collaboration allowed. Gold usually requires roughly 30 points.
A language model clearing that bar hints at progress on long-horizon, creative reasoning, not just pattern-matching multiple-choice answers.
How the model performed
- Solved Problems 1-5 outright (classic geometry, combinatorics, number theory and inequality fare).
- Left Problem 6 blank, mirroring the way many human medalists triage the hardest question.
- Grading process: three former IMO gold medalists independently marked each proof; only unanimous scores were kept.
- Total: 35/42 points, comfortably inside the official gold band.
The solutions, posted in a public GitHub repo, read clean but unmistakably machine-generated.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold-medal level performance on the world’s most prestigious math competition — the International Math Olympiad (IMO). pic.twitter.com/SG3k6EknaC
— Alexander Wei (@alexwei_) July 19 2025
“Not a specialised theorem prover”
Research lead Alexander Wei stresses that the system wasn’t tuned for Olympiad math; instead, it uses a general reinforcement-learning recipe plus “test-time compute scaling” (running many roll-outs at evaluation time). The team deliberately avoided bespoke domains or symbolic helpers.
Altman’s take and a GPT-5 tease
OpenAI CEO Sam Altman called the result “a dream that once felt unrealistic” and framed it as a stepping-stone toward broader general intelligence.
He also tried to temper hype: GPT-5 is “coming soon,” but the IMO-level brain won’t ship for “many months” because it’s still experimental.
What Happens Next?
Open Question | Why It Matters |
---|---|
Independent replication | External mathematicians will want to pore over every line to confirm there are no hidden gaps. |
Robustness | Can the same model ace next year’s brand-new problems or other proof-heavy contests such as Putnam? |
Release strategy | How will OpenAI release future models with this level of reasoning? |
The bigger Picture
- Agentic era meets deep reasoning: OpenAI’s recently announced ChatGPT Agent mode showed practical autonomy, picking the right agent-style tools from its kit and running them on its own virtual computer to get prompted tasks done. This latest achievement shows raw intellectual muscle. Together they sketch an AI that can both think and act across long time horizons.
- Safety remains a watchword: Both Wei and Altman emphasised tight evaluation chains and withheld release, signalling that raw capability will be gated behind careful roll-outs.
For now, the proofs live on GitHub. But the question on many people’s minds is just how long will it be before this level of AI rolls out to the public?