
Will o3's score on the Last Exam be above 30%?
Plus
30
Ṁ4384Feb 12
22%
chance
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Get
1,000and
3.00
Sort by:
Is tool use allowed?
@Frankas hmmmm good question, i'm inclined to resolve NO because deep research with tool use should do strictly better than o3 without tool use, and deep research got 26%?
The Last Exam appears to be primarily a knowledge benchmark, rather than a problem-solving benchmark. All frontier models score very highly on other knowledge benchmarks, but score poorly on The Last Exam. o3 is unlikely to be significantly more knowledgeable than other frontier models.
@Haiku I don’t fully agree. The benchmark was created by mostly filtering through questions that none of frontier models (at that time) can answer.
In math, a lot of these questions are problem solving. I assume o3 is very good at problem solving.
Related questions
Related questions
Will any AI score 30% or more on Humanity's Last Exam benchmark before Ramadan 2025?
10% chance
Will I do well on an upcoming exam?
61% chance
Will o1 score ≥60% on the REBUS benchmark?
89% chance
What will be o3's score on Humanity's Last Exam?
What will Grok 3's score be on Humanity's Last Exam?
Will there be a score of 80% or higher on Humanity's Last Exam before April 1, 2025?
4% chance
What will be o3's score on FrontierMath?
Will at least 90% of students get a B or Higher in my ECON-189 3:30 class?
40% chance
Will I pass all my exams this semester?
74% chance
Will OpenAI's o4 get above 50% on humanity's last exam?
53% chance