Will Gemini-1.5-Pro-Exp-0801 Score Above 1165 in Scale AI's Math Evaluation
Basic
1
Ṁ5Dec 1
48%
chance
1D
1W
1M
ALL
Context:
Gemini-1.5-Pro-Exp-0801 is currently the leading model on the LMYS Arena leaderboard (https://arena.lmsys.org/).
This market is about its potential evaluation by Scale AI (https://scale.com/leaderboard).
Resolution Criteria:
The market resolves as "Yes" if the model is evaluated by Scale AI and It receives a score strictly larger than 96.60 in the Math category.
The market resolves as "No" if the model is evaluated by Scale AI and it receives a score of 96.60 or less in the Math category
The market resolves as "N/A" if either
Scale AI doesn't evaluate the model and add it to the leaderboard before October 1, 2024 or
The evaluation methodology changes before the model is evaluated.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will Gemini-1.5-Pro-Exp-0801 Score Above 1165 in Scale AI's Coding Evaluation
28% chance
Will Gemini-1.5-Pro-Exp-0801 Score Above 90.35 (current #1) in Scale AI's Instruction Following Evaluation
53% chance
Will Gemini achieve a higher score on the SAT compared to GPT-4?
70% chance
Will Gemini-1.5-Pro-Exp-0801 Score Lower Than 8 (current best) in Scale AI's
Adversarial Robustness
56% chance
Will Gemini exceed the performance of GPT-4 on the 2022 AMC 10 and AMC 12 exams?
72% chance
Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on LiveBench?
55% chance
Will Gemini outperform GPT-4 at mathematical theorem-proving?
58% chance
Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on Simple Bench?
55% chance
Will "Gemini [Ultra, 1.0] smash GPT-4 by 5x"?
18% chance
Will Gemini 1.5 Pro seem to be as good as Gemini 1.0 Ultra for common use cases? [Poll]
70% chance