Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference (Ryan Whitwam/Ars Technica)

Techmeme · May 6, 2026

Google released Multi-Token Prediction drafters for Gemma 4 models, using speculative decoding to predict future tokens and achieve up to 3x faster inference.

Categories: Model Releases

Excerpt

<a href="https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/"><img align="RIGHT" border="0" hspace="4" src="http://www.techmeme.com/260506/i38.jpg" vspace="4" /></a> <a href="https://www.techmeme.com/260506/p38#a260506p38" title="Techmeme permalink"><img height="12" src="http://www.techmeme.com/img/pml.png" style="border: none; padding: 0; margin: 0;" width="11" /></a> Ryan Whitwam / <a href="http://arstechnica.com/">Ars Technica</a>: <a href="https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/">Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference</a>  —  Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI.

Read at source: https://www.techmeme.com/260506/p38#a260506p38