BrowseComp: a benchmark for browsing agents
OpenAI releases BrowseComp benchmark for evaluating AI agents' performance on complex multi-step web browsing tasks.
Excerpt
BrowseComp: a benchmark for browsing agents.
Read at source: https://openai.com/index/browsecomp