Sequential Attention: Making AI models leaner and faster without sacrificing accuracy
Google Research introduced Sequential Attention, a method for reducing model compute while preserving accuracy.
Excerpt
Algorithms & Theory
Read at source: https://research.google/blog/sequential-attention-making-ai-models-leaner-and-faster-without-sacrificing-accuracy/