llama.cpp speculative checkpointing was merged
llama.cpp merged speculative checkpointing via ngram drafts, yielding 0-50% speedups for coding tasks depending on acceptance rates.
Excerpt
[https://github.com/ggml-org/llama.cpp/pull/19493](https://github.com/ggml-org/llama.cpp/pull/19493)
Some prompts get a speedup, others don't (cases of low draft acceptance streak).
Good working params depend on the task type and repetition patterns.
For coding, I got some 0%\~50% speedup with these params:
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1sprdm8/llamacpp_speculative_checkpointing_was_merged/
Discussions
- reddit · 143 points · 37 comments
- reddit · 151 points · 42 comments
- reddit · 164 points · 52 comments
- reddit · 170 points · 55 comments
- reddit · 175 points · 55 comments
- reddit · 179 points · 57 comments
- reddit · 194 points · 58 comments
- reddit · 203 points · 59 comments
- reddit · 204 points · 60 comments
- reddit · 215 points · 65 comments
- reddit · 215 points · 64 comments
- reddit · 219 points · 66 comments
- reddit · 220 points · 68 comments
- reddit · 227 points · 70 comments
- reddit · 229 points · 70 comments
- reddit · 232 points · 72 comments
- reddit · 240 points · 72 comments
- reddit · 243 points · 73 comments
- reddit · 240 points · 73 comments
- reddit · 248 points · 71 comments
- reddit · 246 points · 73 comments
- reddit · 245 points · 73 comments
- reddit · 246 points · 73 comments
- reddit · 250 points · 73 comments
- reddit · 252 points · 73 comments
- reddit · 255 points · 73 comments
- reddit · 255 points · 73 comments
- reddit · 258 points · 73 comments
- reddit · 260 points · 73 comments
- reddit · 257 points · 73 comments
- reddit · 258 points · 73 comments
- reddit · 262 points · 73 comments
- reddit · 258 points · 73 comments
- reddit · 258 points · 73 comments
- reddit · 259 points · 73 comments
- reddit · 260 points · 73 comments
- reddit · 259 points · 73 comments