Top Multi-SWE-bench score in 2025?
16
Ṁ26kDec 31
Invalid contract
SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.
Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)
The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.
See also /SG/top-swebench-verified-score-in-2025
This question is managed and resolved by Manifold.
Get
1,000and
3.00
Sort by:
@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:
https://multi-swe-bench.github.io/#/
(Not sure what Mopenhands is...)
Related questions
Related questions
What will be the best performance on SWE-bench Verified by December 31st 2025?
Top SWE-Bench Verified score in 2025?
-
What will be the highest score achieved on SWE-Bench Verified in 2025?
When will SWE-bench be solved?
AI resolves at least X% on SWE-bench WITH assistance, by 2028?
AI resolves at least X% on SWE-bench without any assistance, by 2028?
What will be the best score on Cybench by December 31st 2025?
What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?
What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
61% chance