OpenAI and Anthropic share findings from a joint safety evaluation

OpenAI Blog ·

OpenAI and Anthropic published findings from their first joint safety evaluation testing each other's models for misalignment, instruction following, and jailbreaking vulnerabilities.

Categories: Research

Excerpt

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.