DSpark: Speculative decoding accelerates LLM inference [pdf]

· HN · LLMs ·

DSpark proposes a speculative decoding method for accelerating LLM inference, drawing major technical discussion from builders.

Categories: Research

Excerpt

HN · 793 points · 361 comments

Discussions