Why we no longer evaluate SWE-bench Verified

OpenAI Blog · Feb 23, 2026

OpenAI analysis finds SWE-bench Verified increasingly contaminated with training leakage, recommending SWE-bench Pro for future coding agent evaluation.

Categories: Research

Excerpt

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

Read at source: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified

Discussions

reddit · 135 points · 35 comments
reddit · 161 points · 41 comments
reddit · 180 points · 49 comments
reddit · 201 points · 53 comments
reddit · 221 points · 55 comments
reddit · 234 points · 60 comments
reddit · 252 points · 67 comments
reddit · 262 points · 68 comments
reddit · 271 points · 71 comments
reddit · 282 points · 72 comments
reddit · 302 points · 74 comments
reddit · 314 points · 79 comments
reddit · 327 points · 84 comments
reddit · 332 points · 85 comments
reddit · 342 points · 87 comments
reddit · 357 points · 88 comments
reddit · 367 points · 88 comments
reddit · 373 points · 89 comments
reddit · 373 points · 89 comments
reddit · 378 points · 89 comments
reddit · 384 points · 90 comments
reddit · 386 points · 92 comments
reddit · 392 points · 93 comments
reddit · 396 points · 93 comments
reddit · 411 points · 93 comments
reddit · 409 points · 94 comments
reddit · 408 points · 96 comments
reddit · 411 points · 96 comments
reddit · 417 points · 96 comments
reddit · 421 points · 99 comments
reddit · 415 points · 100 comments
reddit · 422 points · 100 comments
reddit · 423 points · 101 comments
reddit · 425 points · 101 comments
reddit · 430 points · 102 comments
reddit · 433 points · 102 comments
reddit · 435 points · 102 comments
reddit · 446 points · 102 comments