Which company has best AI model end of August? (Chatbot Arena Leaderboard)

341

Ṁ300k

resolved Sep 1

100%99.7%

Google

0.0%

OpenAI

0.2%

xAI

0.0%

Anthropic

0.0%

/Bayesian/who-will-have-the-best-texttovideo-AtZ0CdIc8Z

/Bayesian/which-company-has-best-ai-computer

/Bayesian/which-company-has-best-vision-ai-en

/Bayesian/which-company-has-best-search-ai-mo

Previous months:

/Bayesian/which-company-has-best-ai-model-end

/Bayesian/which-company-has-best-ai-model-end-I0QsydsZuz

/Bayesian/which-company-has-best-ai-model-end-0CRdhqptRl

/Bayesian/which-company-has-best-ai-model-end-uIgZPlOt5d

/Bayesian/which-company-has-best-ai-model-end-A88UsyhdZz

This question is managed and resolved by Manifold.

#Chatbot Arena Leaderboard

Get

1,000

and

3.00

25 Comments

299 Holders

1.7k Trades

Sort by:

@traders see next month's market:

Why is nobody betting google to 99%

google’s not letting them!

why chat at 2% when ahead of gemini

@realDonaldTrump it's explained by the pinned message. if you go on the top right of the text leaderboard and select "without style control", you'll get the correct ordering for the purpose of this market

https://manifold.markets/Geofry/improve-my-watermelon-855USAtE2U?r=R2VvZnJ5

@Geofry dude

bought Ṁ50 YES

This might change this market

bought Ṁ200 YES

@JoaoPedroSantos gpt5-high is now shown and it's still below Gemini 2.5 Pro.

@BayesianOracle they just renamed gpt-5 to gpt-5-high for transparency

@Bayesian gotcha, ty

One interesting thing is head-to-head (with style control), GPT-5 losses to Gemini 2.5 about ~66% of the time, which is significant (p<0.05). GPT-5 beats out some other models at a bit higher rate, but not by much. For example, if we look at the rate GPT-5 beats Claude-Sonnet-4-thinking (0.74 with 47 samples) and the rate rate Gemini2.5 beats than Claude-Sonnet-4-thiniing (0.68 with 330 samples), we can note GPT-5 rate is not significantly greater than the Gemini2.5 rate (Fischer exact test p ~= 0.24).

The 21 point ELO with lead style control seems tenuous, and then are tied in ELO without style control. With more data, Gemini could take the lead there.

(though also just noticed this data is 4 days out of date. They may have made some changes right after release which changes the dynamics)

updates happen every week or so, and gemini 2.5 pro is leading without style control but yeah this is a curious stat (that gemini crushes head-to-head)

About the bit about resolving proportionally in case of a tie, is that for a tie in rankings? e.g., like how right now Google and OpenAI are both at rank 1 without style control (unless I'm misreading).

@sblaplace No, you can have the same ranking but different arena score, and ties refer to arena score ties only

sold Ṁ27 NO

@Bayesian got it, so that's only in case of an exact ELO tie, makes sense ^~^ thanks

filled aṀ250 YES at 24% order

@AffineTyped wanna bet more...

@Bayesian sure

opened a Ṁ500 YES at 20% order

@AffineTyped oh I didn't see you're turning off style control. Lame

rip, mb, it was previously something like "default settings (with style control off)" bc it was a port from previous months when that was the default

@Bayesian yah it's my reading failure, and for some reason I thought we had all migrated to just whatever the leaderboard says at the end of the month

@AffineTyped regardless of their defaulta

Yeah i’m kind of hoping polymarket does this for next year and im planning to do it for next year but yeah arbness is a nice property

Hello

Related questions

Related questions