Generative modeling with sparse transformers

OpenAI Blog ·

Sparse Transformer introduces algorithmic improvements to attention, enabling sequence modeling 30x longer than previous transformers across text, images, and audio.

Categories: Research

Excerpt

We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously.