Accelerating Gemma 4: faster inference with multi-token prediction drafters

· HN · Inference ·

Google DeepMind details using multi-token prediction drafters to accelerate Gemma 4 inference, demonstrating a technique to speed up decoding for existing open models.

Categories: Research

Excerpt

HN · 101 points · 34 comments

Discussions