Extracting Concepts from GPT-4

OpenAI Blog · Jun 6, 2024

OpenAI scales sparse autoencoders on GPT-4 to automatically extract 16 million interpretable concepts, advancing mechanistic interpretability research.

Categories: Research

Excerpt

Using new techniques for scaling sparse autoencoders, we automatically identified 16 million patterns in GPT-4's computations.

Read at source: https://openai.com/index/extracting-concepts-from-gpt-4