SWE-bench Verified
Real-world GitHub issues; human-verified subset. Gold standard for coding agents.
Top 25 models
| Rank | Model | Score | Captured |
|---|---|---|---|
| 1 | Claude Opus 4.5 | 79.2% | 2025-12-15 |
| 2 | Doubao Seed Code | 78.8% | 2025-09-28 |
| 3 | Gemini 3 Pro | 77.4% | 2025-11-20 |
| 4 | Claude Sonnet 4 | 76.8% | 2025-08-04 |
| 5 | Gemini 3 Flash | 75.8% | 2026-02-17 |
| 6 | MiniMax M2.5 | 75.8% | 2026-02-17 |
| 7 | Claude Opus 4.6 | 75.6% | 2026-02-17 |
| 8 | Claude Sonnet 4.5 | 74.8% | 2025-11-03 |
| 9 | GPT-5 | 74.4% | 2025-10-15 |
| 10 | Claude Opus 4 | 73.2% | 2025-05-22 |
| 11 | GPT-5.2 | 72.8% | 2026-02-19 |
| 12 | GLM-5 | 72.8% | 2026-02-17 |
| 13 | Kimi K2 | 71.2% | 2025-10-14 |
| 14 | DeepSeek V3 | 70.0% | 2026-02-17 |
| 15 | Qwen 3 Coder | 69.6% | 2025-08-05 |
| 16 | GLM-4.6 | 68.2% | 2025-09-30 |
| 17 | Claude Haiku 4.5 | 66.6% | 2026-02-17 |
| 18 | Claude 3.7 Sonnet | 66.4% | 2025-05-14 |
| 19 | GPT-5.1 | 66.0% | 2025-11-24 |
| 20 | Claude 3.5 Sonnet | 62.8% | 2025-02-28 |
| 21 | MiniMax M2 | 61.0% | 2025-11-24 |
| 22 | Gemini 2.5 Pro | 53.6% | 2025-07-26 |
| 23 | Gemini 2.0 Flash | 52.2% | 2024-12-12 |
| 24 | o4-mini | 45.0% | 2025-07-26 |
| 25 | o3-mini | 42.4% | 2025-02-14 |
Upstream leaderboard: https://www.swebench.com/