Will LLMs be the best reasoning models on these dates?
Basic
3
Ṁ104
2026
86%
March 31, 2025
76%
June 30, 2025
76%
September 30, 2025
76%
December 31, 2025

On each of these dates, I will examine whether the best models on reasoning benchmarks such as FrontierMath, Humanity's Last Exam, ARC AGI, etc. (or any future similar benchmarks should these become saturated) are generally considered to be LLMs. For avoidance of doubt, o3 and R1 are LLMs. However, I would not consider AlphaProof to be substantially an LLM even if one is used as a component, as the "reasoning" is done via search.

I recognize that this is somewhat inherently subjective and I will therefore not bet on this market. I may do a 50/50 resolution if both LLMs and non-LLMs are competitive for the top spots.

  • Update 2025-02-08 (PST) (AI summary of creator comment): Multimodal models:

    • Multimodal models (e.g., those similar to GPT-4) are to be considered as LLMs for the purposes of this market.

Get
Ṁ1,000
and
S3.00
Sort by:

Are multimodal models like 4o LLMs?

@Fay42 yes

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules