OpenAI experimental LLM scores Gold on IMO math competition

User avatar placeholder
Written by Joseph Nordqvist

Published: 15:47, July 19, 2025

OpenAI says a non-public general-purpose reasoning model has racked up 35 out of 42 points on this year’s International Mathematical Olympiad (IMO) competition. Enough to place in the top medal tier alongside some of the world’s best mathematicians.

Why the IMO matters

The IMO is the Olympics of school-age mathematics: two four-and-a-half-hour sessions, six notoriously inventive proofs, and no calculators, internet, or collaboration allowed. Gold usually requires roughly 30 points.

A language model clearing that bar hints at progress on long-horizon, creative reasoning, not just pattern-matching multiple-choice answers.

How the model performed

  • Solved Problems 1-5 outright (classic geometry, combinatorics, number theory and inequality fare).
  • Left Problem 6 blank, mirroring the way many human medalists triage the hardest question.
  • Grading process: three former IMO gold medalists independently marked each proof; only unanimous scores were kept.
  • Total: 35/42 points, comfortably inside the official gold band.

The solutions, posted in a public GitHub repo, read clean but unmistakably machine-generated.

“Not a specialised theorem prover”

Research lead Alexander Wei stresses that the system wasn’t tuned for Olympiad math; instead, it uses a general reinforcement-learning recipe plus “test-time compute scaling” (running many roll-outs at evaluation time). The team deliberately avoided bespoke domains or symbolic helpers.

Altman’s take and a GPT-5 tease

OpenAI CEO Sam Altman called the result “a dream that once felt unrealistic” and framed it as a stepping-stone toward broader general intelligence.

He also tried to temper hype: GPT-5 is “coming soon,” but the IMO-level brain won’t ship for “many months” because it’s still experimental.

What Happens Next?

Open QuestionWhy It Matters
Independent replicationExternal mathematicians will want to pore over every line to confirm there are no hidden gaps.
RobustnessCan the same model ace next year’s brand-new problems or other proof-heavy contests such as Putnam?
Release strategyHow will OpenAI release future models with this level of reasoning?

The bigger Picture

  • Agentic era meets deep reasoning: OpenAI’s recently announced ChatGPT Agent mode showed practical autonomy, picking the right agent-style tools from its kit and running them on its own virtual computer to get prompted tasks done. This latest achievement shows raw intellectual muscle. Together they sketch an AI that can both think and act across long time horizons.
  • Safety remains a watchword: Both Wei and Altman emphasised tight evaluation chains and withheld release, signalling that raw capability will be gated behind careful roll-outs.

For now, the proofs live on GitHub. But the question on many people’s minds is just how long will it be before this level of AI rolls out to the public?

Joseph Nordqvist Avatar