Will o3's score on the Last Exam be above 30%?
➕
Plus
30
Ṁ4384
Feb 12
22%
chance
Get
Ṁ1,000
and
S3.00
Sort by:
filled a Ṁ250 YES at 27% order

@Frankas hmmmm good question, i'm inclined to resolve NO because deep research with tool use should do strictly better than o3 without tool use, and deep research got 26%?

closing the question for now. I'm inclined to resolve no but i'm not sure. tool use is ok if the tool is running python imo, but not looking up the answer on the internet? hmmm

bought Ṁ10 NO

The Last Exam appears to be primarily a knowledge benchmark, rather than a problem-solving benchmark. All frontier models score very highly on other knowledge benchmarks, but score poorly on The Last Exam. o3 is unlikely to be significantly more knowledgeable than other frontier models.

@Haiku I don’t fully agree. The benchmark was created by mostly filtering through questions that none of frontier models (at that time) can answer.

In math, a lot of these questions are problem solving. I assume o3 is very good at problem solving.

@mathvc sounds like you should bet on the market about this very topic then

@Ziddletwix i disagree on the nature of the benchmark, not on the probability in this market 😜

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules