Improving Model Safety Behavior with Rule-Based Rewards
OpenAI introduces Rule-Based Rewards (RBRs) for aligning models to safety behaviors without extensive human data collection, a method applicable to frontier model training.
Excerpt
We've developed and applied a new method leveraging Rule-Based Rewards (RBRs) that aligns models to behave safely without extensive human data collection.
Read at source: https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards