Block-sparse GPU kernels

OpenAI Blog ·

OpenAI released optimized GPU kernels for block-sparse neural networks that run orders of magnitude faster than cuBLAS/cuSPARSE, achieving SOTA on text and image tasks.

Categories: OSS & Tools, Research

Excerpt

We’re releasing highly-optimized GPU kernels for an underexplored class of neural network architectures: networks with block-sparse weights. Depending on the chosen sparsity, these kernels can run orders of magnitude faster than cuBLAS or cuSPARSE. We’ve used them to attain state-of-the-art results in text sentiment analysis and generative modeling of text and images.