Llama.cpp MTP support now in beta!
Llama.cpp adds beta support for multi-token prediction (MTP), enabling efficient inference of MTP-capable model architectures on the dominant open-source inference runtime.
Excerpt
r/LocalLLaMA · 123 points · 64 comments · github.com
Read at source: https://github.com/ggml-org/llama.cpp/pull/22673