Will any AI model achieve > 40% on Frontier Math before 2026?

Plus

104

Ṁ83k

resolved Dec 13

Resolved

YES

ALL

The model need not be released

Update 2025-09-19 (PST) (AI summary of creator comment): - Resolution will be based on Epoch's reported Frontier Math scores. Other sources (e.g., AI Digest or lab-only reports) will not determine resolution.

This question is managed and resolved by Manifold.

#️ Technology

#AI

#Technical AI Timelines

#OpenAI

#AI Impacts

Get

1,000

and

3.00

25 Comments

85 Holders

480 Trades

Sort by:

@Run 3 has been causing a fever all over the world, immersing players in a world of vibrant rhythms and interesting surprises.

sold Ṁ207 NO

Not verified/posted on official page but GPT-5.2 high is showing 40.3 here https://epoch.ai/benchmarks/use-this-data

@TimDuffy Yeah, I see it on the official dashboard now.

bought Ṁ2,839 YES

@JaundicedBaboon Resolves YES.

bought Ṁ500 NO

https://epoch.ai/frontiermath Epoch

just posted evals and 5.2 only got 26.6%. Will leave unresolved for now in case that was the non-thinking version or the results are amended. It seems shockingly low

FrontierMath

FrontierMath is a benchmark of hundreds of unpublished and extremely challenging math problems to help us to understand the limits of artificial intelligence.

bought Ṁ150 NO

@JaundicedBaboon I'd wait to resolve since there's some small chance Epoch will evaluate Gemini 3 Deep Think, they haven't yet and I bit it would exceed 40 if they did. I'm also surprised at the low score!

bought Ṁ100 NO

The 26% is for 5.2 low, high could be much higher actually!

sold Ṁ106 NO

5.1 scored: 17.3% low, 26.9% med, 31.0% high.
If 5.2 has the same low/high gap, it will be right at 40.

@TimDuffy plus they will test it at extra high not high since thats new for 5.2-thinking

bought Ṁ25 NO

This will likely resolve yes but note that this market is based on Epoch's evaluation, I think the 40.3 we've seen is OpenAI's.

bought Ṁ100 NO

Previously OpenAI evaluated o3 and scored 25.3, Epoch evaluated it and scored it 18.7.

bought Ṁ75 NO

IIRC Epoch hasn't evaluated Gemini 3 Deep Think though, if they do before EOY I think that model is likely to exceed 40%.

bought Ṁ949 YES

Well fuck me 😅

https://epoch.ai/blog/deep-think-math

bought Ṁ200 YES

Epoch reported long ago that Agent 1 scored 49% at original FrontierMath (now tier 1-3) with pass@16.

https://x.com/EpochAIResearch/status/1945905802998423867

Does this count?

@qumeric Pass@16 should definitely not count... If it did, why not pass@32 or pass@64? It's clear that this market is about pass@1.

Why is this so different from this market? Are both based on FrontierMath Tiers 1-3? https://manifold.markets/SG/top-frontiermath-score-in-2025

Resolution will be based on Epoch's reported Frontier Math scores.

Historically openai reported 32% for o3-mini with python (which counts for the purpose of that other market afaict), but Epoch testing it with the general / minimal scaffold got 11.03%. Likely isn't because OpenAI is making up numbers or whatever but they demonstrably have a different setup

@JaundicedBaboon does this resolve according to AI Digest (which includes e.g. lab-reported scores) or according to Epoch’s evaluation?

@bh I’ll go by what Epoch reports

opened a Ṁ500 NO at 45% order

@Bayesian Limit up at 45% ;)

@BrunoJ i can uh... get a better price if i wait... 😭

opened a Ṁ3,000 YES at 51% order

All it would take is running the IMO model on Frontier Math.

bought Ṁ900 NO

bought Ṁ500 NO

@VinceVatter FrontierMath is orders of magnitude harder than IMO.

@traders 116 days until 2026! is a breakthrough expected over the next 4 months? Given the size of the jump from GPT-4 to GPT-5, I'm not sure why this is at 55%. I'm going to keep buying a little bit more NO every day.

Related questions

Related questions