Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
Basic
7
Ṁ577
2040
1.6
expected

For the purposes of this question, major RL benchmarks are ALE, Minecraft, chess, go, and Starcraft II.

Sample efficiency: the number of frames/games/amount of time required to achieve a given level of performance. For this market I will use average human performance: the algorithm must achieve average human performance (measured by score/ELO/time/etc) given the same amount of data.

Video game tasks could include: maximizing score, speed runs, challenge runs, or competing against human players.

I'm restricting the resolution to AAA video games to avoid possibilities like an indie developer making a Turing test video game.

"Essentially all":

  • Can complete >=90% of AAA video games in <= mean human completion time

  • Can achieve a top 100 speedrun (according to whatever the largest speedrun website at the time is) on >=90% of AAA video games given approximately the same amount of time as human speed runners

  • Can complete popular challenge runs on >=90% of AAA video games

The models used can include pretraining as long as the training data does not include frames from the video games. Instructions/manuals/guides can also be used, as long as they are available to human players (e.g. the contents of a speedrunners forum or a youtube video explaining a trick can be part of the input).

Note: this question is about algorithms rather than models. There is no requirement that a single model be able to play multiple video games. In cases where a single model is trained to play multiple video games, I will use its average sample efficiency across all those games.

Get
Ṁ1,000
and
S3.00


Sort by:

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.

Related questions

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?
12
Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?
4.3
By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?
81% chance
Will an AI be able to play a type of video game that it wasn't trained on before 2026?
39% chance
In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?
51% chance
Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
73% chance
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Will an AI system beat humans in the GAIA benchmark before the end of 2025?
65% chance
When will an AI be able to speedrun a popular video game faster than the human WR?
Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?
92% chance
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules