Artificial Analysis composite coding score: equally-weighted average of SciCode, Terminal-Bench Hard, and LiveCodeBench. Higher = better.
latest 2026-06-09 · upstream leaderboard
Crowdsourced human preference Elo ratings. All models.
latest 2026-06-05 · upstream leaderboard
LMArena Elo filtered to non-proprietary licenses. Where OSS actually stands.
latest 2026-06-05 · upstream leaderboard
Length of task (in minutes) an AI can complete at ~50% success, from METR's Time Horizon 1.1 suite (HCAST + SWAA). Longer = more capable.
latest 2026-03-05 · upstream leaderboard
Where developers are actually spending tokens this week. OpenRouter's top-weekly ranking. Real-world adoption signal, not capability.
latest 2026-06-09 · upstream leaderboard
Real-world GitHub issues; human-verified subset. Gold standard for coding agents.
latest 2026-02-17 · upstream leaderboard
Agents completing real tasks in an actual terminal/shell environment.
latest 2026-05-01 · upstream leaderboard