Will Claude Opus 4.5 exceed 80% on SWE-Bench verified?

Ṁ2453

2027

99.1%

chance

ALL

Update 2025-11-05 (PST) (AI summary of creator comment): Resolution will be based on:
- Minimal agent configuration (as described on SWE-bench verified's website)
- No parallel test time compute
- Anthropic's official reporting of the score

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

8 Comments

23 Holders

57 Trades

Sort by:

bought Ṁ394 YES

80.9% https://www.anthropic.com/news/claude-opus-4-5

Surprised this is 80%. Is everyone thinking that Anthropic will manipulate the evals due to facing pressure from Google?

https://manifold.markets/JaundicedBaboon/will-claude-opus-45-achieve-a-sota Made a

similar market about swe-rebench to test this

Will Claude Opus 4.5 achieve a SOTA score on SWE-rebench when it is first evaluated?

50% chance. Resolves when Claude Opus 4.5 is evaluated and its score is visible on https://swe-rebench.com/

@JaundicedBaboon Yah, I don't get it. If anything the closer release rumors of Opus 4.5 should lower expectation of this score and instead market going opposite direction.

A 2.8% jump in 2 months is somewhat faster than progress rate over 2nd half of this year. (~1.2% a month). Not only that, but a YOLO type release would be expected to show less progress compared to a well timed one (Opus 4.1 pulled only 74.5% for under 1% a month of progress).

My expectation is ~79% for a release this week.

@Usaar33 Keep in mind Claude Opus 4 scored lower on SWE-bench than Sonnet 4. I wouldn't be surprised if Opus doesn't even get 78%.

What sources will you use for resolution? Will the score with parallel test time compute be evaluated, or something more like a minimal agent as described on the SWE bench verified's website?

@BenAybar minimal agent, no parallel compute. Will resolve per Anthropic’s reporting

@JaundicedBaboon So sonnet 4.5's score under this standard would have been 77.2%, just to be sure I understand the resolution criteria

@BenAybar yup

Related questions

Related questions