[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost

By ayake_ayake

· r/LocalLLaMA · May 3, 2026

Paper presents Hummingbird+, a low-cost FPGA accelerator achieving 18 tok/s running Qwen3-30B-A3B Q4 with 24GB memory at an expected $150 mass production price, enabling feasible edge deployment of large quantized models.

Categories: Research

Excerpt

r/LocalLLaMA · 103 points · 50 comments · dl.acm.org

Read at source: https://dl.acm.org/doi/pdf/10.1145/3748173.3779189

Discussions

reddit · 103 points · 50 comments
reddit · 106 points · 50 comments
reddit · 104 points · 50 comments
reddit · 105 points · 50 comments
reddit · 115 points · 50 comments
reddit · 112 points · 50 comments
reddit · 115 points · 50 comments
reddit · 118 points · 50 comments
reddit · 121 points · 50 comments
reddit · 125 points · 50 comments
reddit · 126 points · 50 comments
reddit · 127 points · 50 comments
reddit · 133 points · 50 comments