DSpark: Speculative decoding accelerates LLM inference [pdf]
DSpark proposes a speculative decoding method for accelerating LLM inference, drawing major technical discussion from builders.
Excerpt
HN · 793 points · 361 comments
Read at source: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf