Google made Gemma 4 models 3x faster with MTP Drafters
Google adds MTP (Multi-Token Prediction) Drafters to Gemma 4, delivering 3x speedup and enabling the models to run on consumer GPUs and edge devices.
Excerpt
What's new? Speculative decoding pairs a heavy main model with a light drafter to pre-generate tokens; Gemma 4 models now run on consumer GPUs and edge devices;
Read at source: https://www.testingcatalog.com/google-made-gemma-4-models-3x-faster-with-mtp-drafters/