AA Coding Index
Artificial Analysis composite coding score: equally-weighted average of SciCode, Terminal-Bench Hard, and LiveCodeBench. Higher = better.
Top 25 models
| Rank | Model | Score | Captured |
|---|---|---|---|
| 1 | GPT-5 | 59.1% | 2026-05-02 |
| 2 | GPT-5.4 | 57.3% | 2026-05-02 |
| 3 | Gemini 3.1 Pro | 55.5% | 2026-05-02 |
| 4 | GPT-5.3 Codex | 53.1% | 2026-05-02 |
| 5 | Claude Opus 4.7 | 53.1% | 2026-05-02 |
| 6 | Claude Sonnet 4.6 | 50.9% | 2026-05-02 |
| 7 | GPT-5.2 | 48.7% | 2026-05-02 |
| 8 | Claude Opus 4.6 | 48.1% | 2026-05-02 |
| 9 | Claude Opus 4.5 | 47.8% | 2026-05-02 |
| 10 | Kimi K2 | 47.1% | 2026-05-02 |
| 11 | Gemini 3 Pro | 46.5% | 2026-05-02 |
| 12 | GPT-5.1 | 44.7% | 2026-05-02 |
| 13 | GLM-5 | 44.2% | 2026-05-02 |
| 14 | Gemini 3 Flash | 42.6% | 2026-05-02 |
| 15 | Grok 4 | 42.2% | 2026-05-02 |
| 16 | MiniMax M2 | 41.9% | 2026-05-02 |
| 17 | Claude Sonnet 4.5 | 38.6% | 2026-05-02 |
| 18 | o3 | 38.4% | 2026-05-02 |
| 19 | MiniMax M2.5 | 37.4% | 2026-05-02 |
| 20 | DeepSeek V3 | 36.7% | 2026-05-02 |
| 21 | Claude Opus 4.1 | 36.5% | 2026-05-02 |
| 22 | GLM-4.7 | 36.3% | 2026-05-02 |
| 23 | Claude Sonnet 4 | 34.1% | 2026-05-02 |
| 24 | o1-preview | 34.0% | 2026-05-02 |
| 25 | Claude Opus 4 | 34.0% | 2026-05-02 |
Upstream leaderboard: https://artificialanalysis.ai/models/capabilities/coding