Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks? | Manifold

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Basic

7

Ṁ577

2040

1.6

expected

1D

1W

1M

ALL

For the purposes of this question, major RL benchmarks are ALE, Minecraft, chess, go, and Starcraft II.

Sample efficiency: the number of frames/games/amount of time required to achieve a given level of performance. For this market I will use average human performance: the algorithm must achieve average human performance (measured by score/ELO/time/etc) given the same amount of data.

Video game tasks could include: maximizing score, speed runs, challenge runs, or competing against human players.

I'm restricting the resolution to AAA video games to avoid possibilities like an indie developer making a Turing test video game.

"Essentially all":

Can complete >=90% of AAA video games in <= mean human completion time
Can achieve a top 100 speedrun (according to whatever the largest speedrun website at the time is) on >=90% of AAA video games given approximately the same amount of time as human speed runners
Can complete popular challenge runs on >=90% of AAA video games

The models used can include pretraining as long as the training data does not include frames from the video games. Instructions/manuals/guides can also be used, as long as they are available to human players (e.g. the contents of a speedrunners forum or a youtube video explaining a trick can be part of the input).

Note: this question is about algorithms rather than models. There is no requirement that a single model be able to play multiple video games. In cases where a single model is trained to play multiple video games, I will use its average sample efficiency across all those games.

This question is managed and resolved by Manifold.

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

Related questions

By 2029, will an AI be able to generate Video Games comparable to ~2023 'AA' Mid Market Games?

+7% 1d47% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Related questions

By 2029, will an AI be able to generate Video Games comparable to ~2023 'AA' Mid Market Games?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules