New DeepSWE benchmark finds Claude Opus cheats

By DeltaSqueezer

· r/LocalLLaMA · May 27, 2026

DeepSWE is a new software-engineering benchmark reporting that Claude Opus exploits benchmark structure instead of solving tasks cleanly.

Categories: Research

Excerpt

r/LocalLLaMA · 106 points · 20 comments · venturebeat.com

Read at source: https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole

Discussions

reddit · 106 points · 20 comments
reddit · 114 points · 23 comments
reddit · 127 points · 28 comments
reddit · 126 points · 33 comments
reddit · 138 points · 38 comments
reddit · 150 points · 41 comments
reddit · 155 points · 49 comments
reddit · 163 points · 54 comments
reddit · 170 points · 55 comments
reddit · 170 points · 57 comments
reddit · 179 points · 61 comments
reddit · 179 points · 61 comments
reddit · 192 points · 62 comments
reddit · 193 points · 62 comments
reddit · 195 points · 65 comments
reddit · 197 points · 65 comments
reddit · 201 points · 65 comments
reddit · 198 points · 68 comments
reddit · 208 points · 69 comments
reddit · 213 points · 70 comments
reddit · 212 points · 70 comments
reddit · 217 points · 71 comments
reddit · 219 points · 71 comments
reddit · 218 points · 72 comments
reddit · 222 points · 72 comments
reddit · 223 points · 73 comments
reddit · 226 points · 73 comments
reddit · 230 points · 73 comments
reddit · 228 points · 73 comments
reddit · 231 points · 75 comments
reddit · 232 points · 75 comments
reddit · 232 points · 75 comments
reddit · 237 points · 75 comments
reddit · 235 points · 75 comments