The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI Blog · Apr 19, 2024

OpenAI published the Instruction Hierarchy research, a novel training approach that teaches LLMs to prioritize privileged over user instructions, addressing prompt injection vulnerabilities.

Categories: Research

Excerpt

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.

Read at source: https://openai.com/index/the-instruction-hierarchy