Learning to summarize with human feedback
OpenAI applied RLHF to train language models for summarization, demonstrating that human feedback produces more faithful and coherent summaries than supervised approaches.
Excerpt
We’ve applied reinforcement learning from human feedback to train language models that are better at summarization.
Read at source: https://openai.com/index/learning-to-summarize-with-human-feedback