Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70% (Michael Nuñez/VentureBeat)

Techmeme · May 27, 2026

Datacurve released DeepSWE, a 113-task coding benchmark spanning 91 open-source repositories and five programming languages.

Categories: Research

Excerpt

<a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole"><img align="RIGHT" border="0" hspace="4" src="http://www.techmeme.com/260527/i13.jpg" vspace="4" /></a> <a href="https://www.techmeme.com/260527/p13#a260527p13" title="Techmeme permalink"><img height="12" src="http://www.techmeme.com/img/pml.png" style="border: none; padding: 0; margin: 0;" width="11" /></a> Michael Nuñez / <a href="https://venturebeat.com/">VentureBeat</a>: <a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole">Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70%</a>  —  For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same.

Read at source: https://www.techmeme.com/260527/p13#a260527p13