Improving Model Safety Behavior with Rule-Based Rewards

OpenAI Blog · Jul 24, 2024

OpenAI introduces Rule-Based Rewards (RBRs) for aligning models to safety behaviors without extensive human data collection, a method applicable to frontier model training.

Categories: Research

Excerpt

We've developed and applied a new method leveraging Rule-Based Rewards (RBRs) that aligns models to behave safely without extensive human data collection.

Read at source: https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards