Speculative cascades — A hybrid approach for smarter, faster LLM inference

Google Research Blog ·

Google details speculative cascades, a hybrid inference method for improving LLM serving speed and efficiency.

Categories: Research

Excerpt

Generative AI