Top Multi-SWE-bench score in 2025?
16
Ṁ26k
Dec 31

Invalid contract

SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.

Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)

The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.

See also /SG/top-swebench-verified-score-in-2025

Get
Ṁ1,000
and
S3.00
Sort by:

Have you tried gemini 2.5 pro experimental on it yet?

@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:

https://multi-swe-bench.github.io/#/

(Not sure what Mopenhands is...)

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules