Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
Orthrus is an open-source inference optimization for Qwen3 achieving 7.8× throughput improvement while maintaining identical output distribution, likely via speculative decoding or early-exit cascade.
Excerpt
HN · 245 points · 44 comments
Read at source: https://github.com/chiennv2000/orthrus