Will o3's score on the Last Exam be above 30%?

Plus

Ṁ4384

resolved Mar 4

Resolved

ALL

https://lastexam.ai/

/Bayesian/which-of-frontiermath-and-humanitys

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

10 Comments

28 Holders

57 Trades

Sort by:

@Bayesian What is this resolution based on? o3 isn't out, there are no official HLE results for it anywhere I am aware of. Am I missing something?

filled a Ṁ250 YES at 27% order

Is tool use allowed?

https://openai.com/index/introducing-deep-research/

@Frankas hmmmm good question, i'm inclined to resolve NO because deep research with tool use should do strictly better than o3 without tool use, and deep research got 26%?

closing the question for now. I'm inclined to resolve no but i'm not sure. tool use is ok if the tool is running python imo, but not looking up the answer on the internet? hmmm

@moozooh ^

@Bayesian Closing is fine (though I don't see the need, what's the hurry?), but you're also resolving it with a definite answer for which you don't provide the grounds. If you intend to resolve, it should at least be N/A then, citing the lack of hard facts.

bought Ṁ10 NO

The Last Exam appears to be primarily a knowledge benchmark, rather than a problem-solving benchmark. All frontier models score very highly on other knowledge benchmarks, but score poorly on The Last Exam. o3 is unlikely to be significantly more knowledgeable than other frontier models.

@Haiku I don’t fully agree. The benchmark was created by mostly filtering through questions that none of frontier models (at that time) can answer.

In math, a lot of these questions are problem solving. I assume o3 is very good at problem solving.

@mathvc sounds like you should bet on the market about this very topic then

@Ziddletwix i disagree on the nature of the benchmark, not on the probability in this market 😜

Related questions

Related questions