Introducing the SWE-Lancer benchmark
OpenAI releases SWE-Lancer benchmark evaluating frontier LLMs on real-world freelance software engineering tasks with up to $1M in potential earnings, establishing a new economic evaluation framework.
Excerpt
Can frontier LLMs earn $1 million from real-world freelance software engineering?
Read at source: https://openai.com/index/swe-lancer