OpenAI and Anthropic share findings from a joint safety evaluation

OpenAI Blog · Aug 27, 2025

OpenAI and Anthropic published findings from their first joint safety evaluation testing each other's models for misalignment, instruction following, and jailbreaking vulnerabilities.

Categories: Research

Excerpt

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.

Read at source: https://openai.com/index/openai-anthropic-safety-evaluation