METR Time Horizon
Length of task (in minutes) an AI can complete at ~50% success, from METR's Time Horizon 1.1 suite (HCAST + SWAA). Longer = more capable.
Top 20 models
| Rank | Model | Score | Captured |
|---|---|---|---|
| 1 | Claude Opus 4.6 | 11h59m | 2026-02-05 |
| 2 | Gemini 3.1 Pro | 6h24m | 2026-02-19 |
| 3 | GPT-5.2 | 5h52m | 2025-12-11 |
| 4 | GPT-5.3 Codex | 5h50m | 2026-02-05 |
| 5 | GPT-5.4 | 5h42m | 2026-03-05 |
| 6 | Claude Opus 4.5 | 4h53m | 2025-11-24 |
| 7 | Gemini 3 Pro | 3h44m | 2025-11-18 |
| 8 | GPT-5.1 Codex Max | 3h44m | 2025-11-19 |
| 9 | GPT-5 | 3h23m | 2025-08-07 |
| 10 | o3 | 1h60m | 2025-04-16 |
| 11 | Claude Opus 4.1 | 1h40m | 2025-08-05 |
| 12 | Claude Opus 4 | 1h40m | 2025-05-22 |
| 13 | Claude 3.7 Sonnet | 1h | 2025-02-24 |
| 14 | o1 | 39m | 2024-12-05 |
| 15 | Claude 3.5 Sonnet | 21m | 2024-10-22 |
| 16 | o1-preview | 20m | 2024-09-12 |
| 17 | GPT-4o | 7m | 2024-05-13 |
| 18 | GPT-4 Turbo | 4m | 2023-11-06 |
| 19 | GPT-4 | 4m | 2023-03-14 |
| 20 | Claude 3 Opus | 4m | 2024-03-04 |
Upstream leaderboard: https://metr.org/time-horizons/