OpenAI and Anthropic share findings from a joint safety evaluation
OpenAI and Anthropic published findings from their first joint safety evaluation testing each other's models for misalignment, instruction following, and jailbreaking vulnerabilities.
Excerpt
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
Read at source: https://openai.com/index/openai-anthropic-safety-evaluation