[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost

· r/LocalLLaMA ·

Paper presents Hummingbird+, a low-cost FPGA accelerator achieving 18 tok/s running Qwen3-30B-A3B Q4 with 24GB memory at an expected $150 mass production price, enabling feasible edge deployment of large quantized models.

Categories: Research

Excerpt

r/LocalLLaMA · 103 points · 50 comments · dl.acm.org

Discussions