Will o1 score ≥60% on the REBUS benchmark? | Manifold

Will o1 score ≥60% on the REBUS benchmark?

Plus

5

Ṁ1905

Feb 28

89%

chance

1D

1W

1M

ALL

Update 2024-22-12 (PST): This market refers to the REBUS benchmark as described in the paper "REBUS: A Benchmark to Evaluate the Rationality of Language Models" (AI summary of creator comment)

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

Sort by:

I'll probably try running this this week if I can automate the web interaction (unless the API comes out before then)

Referring, of course, to the famous https://arxiv.org/abs/2401.05604

bought Ṁ100 YES

For reference, the release version of 4o scored 42%, and the human baseline is 83%.

@derikk after looking at the examples and not getting any correct and then seeing 83% as the human baseline I felt really bad till I read that humans were allowed to Google and use reverse image search.

Related questions

Will any AI score 30% or more on Humanity's Last Exam benchmark before Ramadan 2025?

+4% 1d10% chance

Before what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?

Will any AI get a score of at least 45% on Humanity’s Last Exam benchmark before March 11, 2025?

Will there be a score of 80% or higher on Humanity's Last Exam before April 1, 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Will an AI score over 80% on FrontierMath Benchmark in 2025

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Will o3's score on the Last Exam be above 30%?

What will be o3's score on Humanity's Last Exam?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?

Related questions

Will any AI score 30% or more on Humanity's Last Exam benchmark before Ramadan 2025?

Will an AI score over 80% on FrontierMath Benchmark in 2025

Before what year will Al achieve 95% or higher score on the Humanity’s Last Exam benchmark?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Will any AI get a score of at least 45% on Humanity’s Last Exam benchmark before March 11, 2025?

Will o3's score on the Last Exam be above 30%?

Will there be a score of 80% or higher on Humanity's Last Exam before April 1, 2025?

What will be o3's score on Humanity's Last Exam?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules