Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%

By gladkos

· r/LocalLLaMA · May 8, 2026

Multi-Token Prediction (MTP) optimization merged into LLaMA.cpp delivers ~40% inference speedup for Gemma 4, improving throughput on local inference runs.

Categories: OSS & Tools

Excerpt

r/LocalLLaMA · 104 points · 20 comments · v.redd.it

Read at source: https://v.redd.it/ccxn81zo5tzg1

Discussions

reddit · 104 points · 20 comments
reddit · 133 points · 22 comments
reddit · 148 points · 23 comments
reddit · 169 points · 26 comments
reddit · 183 points · 30 comments
reddit · 202 points · 33 comments
reddit · 222 points · 34 comments
reddit · 238 points · 34 comments
reddit · 259 points · 35 comments
reddit · 273 points · 36 comments
reddit · 294 points · 44 comments
reddit · 317 points · 45 comments
reddit · 331 points · 50 comments
reddit · 344 points · 61 comments
reddit · 360 points · 64 comments
reddit · 369 points · 67 comments
reddit · 387 points · 71 comments
reddit · 404 points · 73 comments
reddit · 408 points · 74 comments
reddit · 419 points · 76 comments
reddit · 427 points · 76 comments
reddit · 434 points · 76 comments
reddit · 441 points · 80 comments
reddit · 448 points · 80 comments
reddit · 460 points · 81 comments
reddit · 465 points · 82 comments
reddit · 476 points · 82 comments
reddit · 481 points · 82 comments
reddit · 486 points · 86 comments
reddit · 494 points · 86 comments
reddit · 492 points · 86 comments
reddit · 499 points · 86 comments
reddit · 503 points · 92 comments