Will an opensource LLM on huggingface beat an average human at the most common LLM benchmarks by July 1, 2024?

Plus

Ṁ1204

Jul 2

74%

chance

ALL

Open source models are measured against ARC, HellaSwag, MMLU, and TruthfulQA on https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard . I've added the following plot to this huggingface space so we can see the progress of open source models over time:

I'm wondering when the average human baseline will be passed. A linear trend indicates July 2024 with a 0.89 pearson coef:

But this trend might not be linear. This questions will resolve Yes if the average human baseline on Open LLM Leaderboard on huggingface is surpassed before July 1, 2024.

Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

#LLMs

#AI Safety

#️ AI Alignment

Get

1,000

and

3.00

3 Comments

22 Holders

33 Trades

Sort by:

@ChrisCanal resolve?

"an average human" in the title is misleading given that "average human baseline" consists of MMLU human baseline at ~90% which is expert performance instead of average human performance.

from: D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020
https://arxiv.org/pdf/2009.03300.pdf

predictedYES

@Metastable yeah it's so hard to measure these things. Don't worry I have an idea to fix this.

Related questions

Related questions