
Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?
Basic
1
Ṁ202050
12
expected
1D
1W
1M
ALL
Transfer model criteria:
The model can include pretrained non-RL components (e.g. it can include a language or image model (effort should have been made to avoid including states from the RL environments in the training set for any pretrained components, but this doesn't have to be perfect)).
The model can train for any amount of time on the training set of RL environments
Once transferred it must achieve mean human performance with human level sampling efficiency on >=75% of the test environments
Non-transfer model:
Can include pretrained components in the same way
Must achieve mean human performance with human level sampling efficiency on >= 75% of all the environments (there are no training vs test environments)
This question is managed and resolved by Manifold.
Get
1,000and
3.00
Related questions
Related questions
Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
1.6
Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?
4.3
Will any transfer learning model, trained for any amount of time on one Atari environment, outperform the median human learning curve on most other Atari environments when transferred by 2026?
45% chance
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
73% chance
By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?
81% chance
Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?
37
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
68% chance
Benchmark Gap #7: Once 10% of the medical Grand Challenges are "solved", how many months will it be before AI are in common use in hospitals for analyzing medical images with minimal human oversight?
64
Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?
42% chance