AI safety via debate

OpenAI Blog ·

OpenAI proposed using competitive debate between AI agents, judged by humans, as a technique for scalable AI safety alignment.

Categories: Research

Excerpt

We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.