Introducing the SWE-Lancer benchmark

OpenAI Blog ·

OpenAI releases SWE-Lancer benchmark evaluating frontier LLMs on real-world freelance software engineering tasks with up to $1M in potential earnings, establishing a new economic evaluation framework.

Categories: Research

Excerpt

Can frontier LLMs earn $1 million from real-world freelance software engineering?