Weak-to-strong generalization

OpenAI Blog · Dec 14, 2023

OpenAI introduces weak-to-strong generalization, showing that strong models can learn to emulate even stronger models using weak human-level oversight, addressing superalignment challenges.

Categories: Research

Excerpt

We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?

Read at source: https://openai.com/index/weak-to-strong-generalization