Generative modeling with sparse transformers
Sparse Transformer introduces algorithmic improvements to attention, enabling sequence modeling 30x longer than previous transformers across text, images, and audio.
Excerpt
We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30x longer than possible previously.
Read at source: https://openai.com/index/sparse-transformer