Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70% (Michael Nuñez/VentureBeat)

Techmeme ·

Datacurve released DeepSWE, a 113-task coding benchmark spanning 91 open-source repositories and five programming languages.

Categories: Research

Excerpt

<a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole"><img align="RIGHT" border="0" hspace="4" src="http://www.techmeme.com/260527/i13.jpg" vspace="4" /></a> <p><a href="https://www.techmeme.com/260527/p13#a260527p13" title="Techmeme permalink"><img height="12" src="http://www.techmeme.com/img/pml.png" style="border: none; padding: 0; margin: 0;" width="11" /></a> Michael Nu&ntilde;ez / <a href="https://venturebeat.com/">VentureBeat</a>:<br /> <span style="font-size: 1.3em;"><b><a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole">Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70%</a></b></span>&nbsp; &mdash;&nbsp; For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same.</p>