Extracting Concepts from GPT-4

OpenAI Blog ·

OpenAI scales sparse autoencoders on GPT-4 to automatically extract 16 million interpretable concepts, advancing mechanistic interpretability research.

Categories: Research

Excerpt

Using new techniques for scaling sparse autoencoders, we automatically identified 16 million patterns in GPT-4's computations.