llama.cpp speculative checkpointing was merged

By AdamDhahabi

· r/LocalLLaMA · Apr 19, 2026

llama.cpp merged speculative checkpointing via ngram drafts, yielding 0-50% speedups for coding tasks depending on acceptance rates.

Categories: OSS & Tools

Excerpt

[https://github.com/ggml-org/llama.cpp/pull/19493](https://github.com/ggml-org/llama.cpp/pull/19493) Some prompts get a speedup, others don't (cases of low draft acceptance streak). Good working params depend on the task type and repetition patterns. For coding, I got some 0%\~50% speedup with these params: --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64

Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1sprdm8/llamacpp_speculative_checkpointing_was_merged/

Discussions

reddit · 143 points · 37 comments
reddit · 151 points · 42 comments
reddit · 164 points · 52 comments
reddit · 170 points · 55 comments
reddit · 175 points · 55 comments
reddit · 179 points · 57 comments
reddit · 194 points · 58 comments
reddit · 203 points · 59 comments
reddit · 204 points · 60 comments
reddit · 215 points · 65 comments
reddit · 215 points · 64 comments
reddit · 219 points · 66 comments
reddit · 220 points · 68 comments
reddit · 227 points · 70 comments
reddit · 229 points · 70 comments
reddit · 232 points · 72 comments
reddit · 240 points · 72 comments
reddit · 243 points · 73 comments
reddit · 240 points · 73 comments
reddit · 248 points · 71 comments
reddit · 246 points · 73 comments
reddit · 245 points · 73 comments
reddit · 246 points · 73 comments
reddit · 250 points · 73 comments
reddit · 252 points · 73 comments
reddit · 255 points · 73 comments
reddit · 255 points · 73 comments
reddit · 258 points · 73 comments
reddit · 260 points · 73 comments
reddit · 257 points · 73 comments
reddit · 258 points · 73 comments
reddit · 262 points · 73 comments
reddit · 258 points · 73 comments
reddit · 258 points · 73 comments
reddit · 259 points · 73 comments
reddit · 260 points · 73 comments
reddit · 259 points · 73 comments