Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference (Ryan Whitwam/Ars Technica)
Google released Multi-Token Prediction drafters for Gemma 4 models, using speculative decoding to predict future tokens and achieve up to 3x faster inference.
Excerpt
<a href="https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/"><img align="RIGHT" border="0" hspace="4" src="http://www.techmeme.com/260506/i38.jpg" vspace="4" /></a>
<p><a href="https://www.techmeme.com/260506/p38#a260506p38" title="Techmeme permalink"><img height="12" src="http://www.techmeme.com/img/pml.png" style="border: none; padding: 0; margin: 0;" width="11" /></a> Ryan Whitwam / <a href="http://arstechnica.com/">Ars Technica</a>:<br />
<span style="font-size: 1.3em;"><b><a href="https://arstechnica.com/ai/2026/05/googles-gemma-4-open-ai-models-use-speculative-decoding-to-get-up-to-3x-faster/">Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference</a></b></span> — Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI.</p>
Read at source: https://www.techmeme.com/260506/p38#a260506p38