Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?

Premium

803

Ṁ810k

resolved Dec 31

100%98.4%

Google

1.3%

OpenAI

0.0%

Anthropic

0.0%

Mistral

0.0%

Inflection

0.1%

xAI

0.0%

Meta

0.0%

Apple

0.0%

Cohere

0.0%

Microsoft

0.0%Other

I was browsing Twitter, and I saw a post by Karpathy positively talking about ChatBot Arena, which is a platform for ranking LLMs based on human ratings. As expected, OpenAI is holding positions 1, 2, and 3. I wonder which company will be #1 at the end of 2024.

Screenshot of the rankings table taken on the 13th of December:

/Soli/who-will-have-the-best-llm-at-the-e-382ae559b471

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

25 Comments

752 Holders

4.6k Trades

Sort by:

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." When I created this question, a tie was not an option, so I doubt anyone even traded based on this assumption.

I created a similar question that only uses the rank. Feel free to trade on it.

View original context

seems rigged https://www.reddit.com/r/MachineLearning/comments/1i83mhj/lm_arena_public_voting_is_not_objective_for_llm/?share_id=S-rS3xcvQLLs9G7uUPnRV

@jim https://x.com/lmarena_ai/status/1882485590798819656

@Soli yes thanks for posting, it is important clarification

@traders same market for 2025

😭

little fanfare for Google's great victory

buying other because it would be funny

https://x.com/deepseek_ai/status/1872242657348710721

@jim not quite, but so impressive

Big E just called, interesting

How come o1 isn't on the list on the chatbot arena?

Gemini flash 2.0 strawberry in the api

https://ai.google.dev/gemini-api/docs/thinking-mode

10k limit order @75% for anyone feeling brave

bought Ṁ500 YES

@WillSorenson it is slightly short of exp 1206. Are you assuming a thinking 1206 will be added?

@Usaar33 It appears more pleasant than o1 to me so it makes it unlikely o1 will top the charts. The following all have to go right for OAI to win:

1. They have to release a new model today
2. It has to actually be better in the dimensions that chatbot arena evaluates
3. Chatbot arena has to update it in time.

Possible! Not more than a 20% chance.

@WillSorenson

betting against openai

and

betting against elon

Brave.

bought Ṁ50 YES

@jim I’m also the second biggest xAi yes holder! Until December I was v bearish against google and thought the relative lack of censorship of grok would win out when chatbots were broadly good enough to answer most questions. I changed my mind when events turned against me

Google deepmind was and is severely underrated by this market. The odds are looking more reasonable now though

@AJama The rumor is that OpenAI will release GPT-4.5 soon.

@NeuralBets i would give it a 80/90% that OpenAI releases a new model as part of their 12 day of christmas but I am not sure they will make it available to LMSYS before end of the year - i am too deep at this point anyways so 🤷‍♂️

i am too deep at this point anyways so 🤷‍♂️

hah same 😅

opened a Ṁ10,000 YES at 40% order

@Bayesian right now this position represents ~80% of my mana net worth but i am doubling down and put a large limit order at 40% on openai

@JasonDavies @EliLifland FYI

@Soli it should be said that new model doesn’t mean that it will become N1. Reason 1: google may have fine tuned to perform way better on lmsys. Reason 2: google may have another fine tuned ready to answer any score release from OAI. Maybe google ceo and PM have their compensation tied to end-year perfomance on LMSYS

@mathvc true, openai released the new preview model over the api yesterday (which is still not ranked in LMSYS) and I expect another major announcement sooon so we shalll seee how it goes

Gemini 1206 is now top 1 model in all categories by a small margin yet people think OAI will be better at the end of the year (59% at the moment). Do people believe in new release? GPT4.5?

@mathvc I think it's more a question of how often the leader board is updated.

I agree with your stance, I just don't know if I want more exposure to this market with my novice level of understanding of the subject.

Comment hidden

Related questions

Related questions