Llama.cpp MTP support now in beta!

· r/LocalLLaMA ·

Llama.cpp adds beta support for multi-token prediction (MTP), enabling efficient inference of MTP-capable model architectures on the dominant open-source inference runtime.

Categories: OSS & Tools

Excerpt

r/LocalLLaMA · 123 points · 64 comments · github.com

Discussions