Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution

· HN · GitHub AI ·

Orthrus is an open-source inference optimization for Qwen3 achieving 7.8× throughput improvement while maintaining identical output distribution, likely via speculative decoding or early-exit cascade.

Categories: OSS & Tools

Excerpt

HN · 245 points · 44 comments

Discussions