Improving Model Safety Behavior with Rule-Based Rewards

OpenAI Blog ·

OpenAI introduces Rule-Based Rewards (RBRs) for aligning models to safety behaviors without extensive human data collection, a method applicable to frontier model training.

Categories: Research

Excerpt

We've developed and applied a new method leveraging Rule-Based Rewards (RBRs) that aligns models to behave safely without extensive human data collection.