Weak-to-strong generalization

OpenAI Blog ·

OpenAI introduces weak-to-strong generalization, showing that strong models can learn to emulate even stronger models using weak human-level oversight, addressing superalignment challenges.

Categories: Research

Excerpt

We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?