METR Time Horizon

Category: agents · Unit: min · Last refreshed

Length of task (in minutes) an AI can complete at ~50% success, from METR's Time Horizon 1.1 suite (HCAST + SWAA). Longer = more capable.

Top 20 models

RankModelScoreCaptured
1 Claude Opus 4.6 11h59m 2026-02-05
2 Gemini 3.1 Pro 6h24m 2026-02-19
3 GPT-5.2 5h52m 2025-12-11
4 GPT-5.3 Codex 5h50m 2026-02-05
5 GPT-5.4 5h42m 2026-03-05
6 Claude Opus 4.5 4h53m 2025-11-24
7 Gemini 3 Pro 3h44m 2025-11-18
8 GPT-5.1 Codex Max 3h44m 2025-11-19
9 GPT-5 3h23m 2025-08-07
10 o3 1h60m 2025-04-16
11 Claude Opus 4.1 1h40m 2025-08-05
12 Claude Opus 4 1h40m 2025-05-22
13 Claude 3.7 Sonnet 1h 2025-02-24
14 o1 39m 2024-12-05
15 Claude 3.5 Sonnet 21m 2024-10-22
16 o1-preview 20m 2024-09-12
17 GPT-4o 7m 2024-05-13
18 GPT-4 Turbo 4m 2023-11-06
19 GPT-4 4m 2023-03-14
20 Claude 3 Opus 4m 2024-03-04