An AI Just Disproved a Conjecture That 80 Years of Mathematicians Couldn't. The Proof Uses Math Most of Them Don't Know.

Zero point zero one four.

That is the exponent, the tiny, stubborn, irrefutable number, by which an OpenAI reasoning model just shattered an 80-year-old conjecture in discrete geometry, and with it a quiet assumption that had shaped an entire subfield of mathematics since 1946: that Paul Erdős was right about how points cluster at unit distances on a flat plane.

He wasn't.

On May 20, OpenAI announced that an internal model had autonomously disproved the Erdős unit distance conjecture, one of the most famous open problems in combinatorial geometry. It didn't assist a human. It didn't verify steps in someone else's argument. Given only a machine-written statement of the problem, it independently constructed a new infinite family of point configurations that produce more unit-distance pairs than anyone believed possible, then proved its construction works using tools from algebraic number theory that most geometers have never touched. A companion paper by nine external mathematicians (including Fields Medalist Tim Gowers, combinatorialist Noga Alon, and Princeton's Will Sawin) verified the result, refined it, and wrote what amounts to a collective endorsement that this proof is the real thing.

Seven months ago, OpenAI tried to plant this exact flag and humiliated itself. In October 2025, then-VP Kevin Weil posted on X that GPT-5 had solved ten previously unsolved Erdős problems. Thomas Bloom, who maintains the Erdős Problems database, dismantled the claim within days: the model had merely found solutions already published in the literature and presented them as discoveries. Bloom called it "a dramatic misrepresentation." Yann LeCun piled on. Demis Hassabis piled on. Weil deleted the post and left OpenAI by April 2026.

This time Bloom signed the companion paper.

Erdős's Conjecture, in 30 Seconds

Scatter n points on a flat surface. Count how many pairs sit exactly one unit apart. Call that maximum count v(n). Erdős showed in 1946 that a slightly skewed square grid gets you roughly n^{1+c/log log n} pairs, just barely superlinear growth that creeps upward so slowly it effectively looks like n itself for any n you could actually draw on paper. He conjectured this was essentially optimal: for any fixed positive ε, the maximum v(n) stays below C·n^1+ε for all sufficiently large n. For four decades, the best-known upper bound, established in 1984 by Spencer, Szemerédi, and Trotter, sits at O(n^4/3). That bound hadn't budged in 42 years.

Here is what the AI proved: for infinitely many values of n, v(n) ≥ n^1+δ for a fixed positive δ. Sawin's follow-up refinement pins δ at 0.014. That means, roughly, one percent more unit-distance pairs per doubling of the point count than the old grid produces. It sounds microscopic. But Erdős's conjecture said exactly this kind of fixed-exponent improvement was impossible. Put differently, 0.014 is the difference between "Erdős was right" and "Erdős was wrong," and in mathematics, wrong is wrong regardless of the margin.

A Cross-Domain Leap Nobody Made

Here is what makes the proof genuinely surprising, even to the mathematicians who checked it.

Consider where each piece lives. Plane geometry houses the unit distance problem. Algebraic number theory houses the proof. Erdős's original construction used Gaussian integers (numbers of the form a + bi, where a and b are ordinary integers and i is the square root of −1). Instead of Gaussian integers, the AI substituted richer algebraic number fields, ring extensions whose internal symmetries produce denser patterns of unit-distance pairs. To show that number fields with the necessary properties actually exist in infinite supply, the proof reaches for Golod–Shafarevich theory and infinite class field towers, deep specialized tools from a corner of mathematics that combinatorial geometers almost never enter.

Sawin, who refined the result, explained why the obvious generalization fails: if you take one extended number system and simply look at bigger chunks of it, you just recover Erdős's original bound. Its key insight was counterintuitive: keep the scale fixed within each number system but switch to progressively richer systems at every step, exploiting the growth of available unit-length differences as algebraic degree increases. "Why that particular switch works wasn't obvious to any human," Sawin wrote.

Bloom offered a blunt accounting of why no human got there first. Four conditions had to hold simultaneously: you had to spend serious time on the problem, you had to bet against Erdős and actually attempt a disproof instead of trying to prove the conjecture, you had to think of translating the original construction into general number fields, and you had to know enough class field theory to pull it off. "The AI met all of these criteria," Bloom wrote. "It combines superhuman levels of patience with familiarity with a vast array of technical machinery."

A Number Humans Can Count On Their Fingers

Bloom's four-condition framework invites a rough calculation nobody seems to have done. How many living mathematicians could plausibly have met all four conditions simultaneously?

Combinatorial geometry is a small field, perhaps 200 to 300 active researchers worldwide who work regularly on Erdős-type distance problems, based on publication counts in Discrete & Computational Geometry and the Journal of Combinatorial Theory over the last decade. Of those, how many would have bet against Erdős? Bloom himself noted that "most of the human efforts spent on this problem have been on trying to prove the upper bound, rather than spending serious time on trying to disprove it." Let's be generous and say 10 percent were open to attempting a disproof: 20 to 30 people. How many of those also possess working knowledge of class field towers and Golod–Shafarevich theory? These tools live in algebraic number theory, a field whose active practitioners overlap with combinatorial geometry by perhaps a handful of individuals at the faculty level. That intersection is almost certainly in the single digits.

In this particular case, the AI's advantage wasn't raw intelligence or even computational brute force in the traditional sense. It was what Daniel Litt, one of the nine companion authors, called the cost of specialization: "incentives towards specialization and silo-ing, though understandable, have cost us some high-quality science." The model had no silos. It had no tenure-track incentive to publish within a single subfield. It had no fear of wasting a year chasing a disproof that might not exist. It just tried everything it knew, including ideas from fields that the geometers never searched, with the patience of a machine that does not worry about grant renewal deadlines or reputational risk.

Why This Might Be Less Than It Looks

Melanie Matchett Wood, one of the nine verifying mathematicians, delivered the most pointed qualification in the companion paper: "I believe if the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago… the mathematicians would have found a counterexample."

That's a remarkable statement worth sitting with, because it reframes the achievement entirely. Consider who wrote that companion paper: Alon, Bloom, Gowers, Litt, Sawin, Shankar, Tsimerman, Wang, and Wood. If you put that team in a room with the specific instruction to disprove the unit distance conjecture using number-theoretic tools, Wood is saying, they would have cracked it without an AI.

And inside OpenAI, the model wasn't built in a vacuum. Its team included Mehtaab Sawhney (Morgan Prize winner, one of the most prolific young combinatorialists alive), Mark Sellke (Putnam Fellow, IMO Gold Medalist), Lijie Chen (theoretical computer science prodigy), and Seb Bubeck. This is not a random software engineering team handing a math problem to a chatbot. This is a mathematical dream team calibrating the scaffolding, the prompts, the grading pipeline, and the failure modes of a system that generated a 125-page chain of thought before arriving at the proof. OpenAI claims the final proof was generated "in one shot," but that terminology, as critical analyses have noted, likely refers to the final successful inference trace, not the months of development that produced the system capable of generating it.

All 125 pages of the chain of thought, a "rewritten summary" of the model's reasoning, reads less like a mathematician's eureka moment and more like an exhaustive depth-first search through the space of known mathematical concepts: polynomials, finite fields, elliptic curves, Dirichlet units, each attempted and abandoned before the Golod–Shafarevich connection emerged. If the rewritten summary runs 125 pages, the raw output was almost certainly orders of magnitude larger: millions of tokens of dead ends, backtracking, and failed approaches filtered through an AI grading pipeline before a human ever looked at the result.

What We Don't Know

OpenAI has not disclosed the model's name, its architecture, its parameter count, the number of inference runs attempted before the successful trace, the total compute cost, or the design of the grading pipeline that flagged the proof as worth human review. We don't know how much of the scaffolding was hand-tuned for this specific problem versus being a general research tool. We don't know whether other Erdős problems were attempted simultaneously and failed. Without these denominators, we cannot distinguish between a model that reliably solves hard math problems and one that expensively generates a vast cloud of reasoning traces until one happens to be correct. That distinction matters enormously for assessing whether this generalizes beyond a single spectacular result.

The proof itself hasn't undergone formal peer review through a journal process. Nine world-class verifiers provide strong social evidence but not a substitute for the adversarial scrutiny of anonymous referees with months to look for errors. Mathematical proofs are uniquely verifiable (either the construction works or it doesn't), but the history of claimed breakthroughs that collapsed under sustained review (Mochizuki's inter-universal Teichmüller theory being the most prominent recent example) counsels patience even when early verification is enthusiastic.

How Fast This Is Moving

Noam Brown, an OpenAI researcher, posted: "Less than 1 year ago frontier AI models were at IMO gold-level performance. I expect this pace of progress to continue."

That timeline is the number that should command attention beyond mathematics. In roughly eight months, the frontier moved from solving competition problems designed for 18-year-olds (hard competition problems, to be fair; IMO gold is elite) to disproving a research-level conjecture that professional mathematicians had attacked without success for eight decades, a result that Gowers says he would accept for the Annals of Mathematics, the field's most prestigious journal, "without any hesitation."

Between those two goalposts lie entire categories of intellectual work: graduate-level problem sets, qualifying exam questions, publishable-but-incremental research, novel lemmas, cross-domain connections, and original conjectures. If the eight-month pace holds (a massive if, since capability curves rarely extend linearly), then a meaningful fraction of professional mathematics becomes AI-tractable within two to three years. Gowers himself acknowledged this possibility: "we have still probably entered an era where it will become very difficult for humans to compete with AI at solving mathematical problems."

What You Can Do

If you are a working mathematician, the single most actionable takeaway from this result is to look at your own unsolved problems through the lens of Bloom's four-condition test. Ask: is this problem stuck because of a genuine mathematical barrier, or because of a sociological one: everyone assumes the conjecture is true, nobody's trying to disprove it, or the relevant tools live in a subfield that specialists in the problem's home domain don't know? Erdős's conjecture wasn't stuck because the mathematics was impossibly hard. It was stuck because the people who knew the geometry didn't know the number theory, and the people who knew the number theory weren't looking at the geometry. If your problem has a similar structure (and Litt's companion remarks suggest many do), running it through a frontier AI model with broad training may be worth your time today, not in five years.

If you invest in AI companies, track the "Gowers line": would a Fields Medalist accept this result for a top journal without hesitation? That standard just got crossed for the first time. When it starts getting crossed routinely—multiple results per quarter, across different mathematical subfields—the productivity implications for scientific R&D become concrete and investable. Watch the companion-paper pipeline: results validated by domain experts, not just benchmark scores.

If you are a student deciding whether to pursue a PhD in pure mathematics, the honest answer is that the landscape is shifting under your feet, but it is shifting in a way that makes expertise more valuable, not less. Nine human experts were still needed to verify, refine, and contextualize its output. Terence Tao's concept of "proof indigestion" (AI generates proofs faster than humans can understand and build on them) means that the bottleneck is moving from proof generation to proof comprehension, and that is a human skill with no current AI substitute.

The Bottom Line

An AI model used tools from algebraic number theory that almost no combinatorial geometer knew to apply, disproved a conjecture that almost no one had tried to disprove, and produced a proof that one of the most decorated mathematicians alive says he would have published in the world's top journal on the spot. The numerical improvement (an exponent of 0.014) is tiny. Structurally, something else happened entirely: an AI independently connected two distant mathematical fields to solve a problem trapped in the gap between them, the kind of cognitive act that, until last Tuesday, we assumed required a human brain. Whether you call that autonomous reasoning, sophisticated search, or centaur mathematics with the human contribution hidden behind a corporate curtain, the mathematical fact stands: Erdős was wrong, and a machine found the proof. Everything that follows from that sentence is a conversation about the future of intellectual work itself.