BrowseComp: a benchmark for browsing agents

OpenAI Blog ·

OpenAI releases BrowseComp benchmark for evaluating AI agents' performance on complex multi-step web browsing tasks.

Categories: Research

Excerpt

BrowseComp: a benchmark for browsing agents.