[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost
Paper presents Hummingbird+, a low-cost FPGA accelerator achieving 18 tok/s running Qwen3-30B-A3B Q4 with 24GB memory at an expected $150 mass production price, enabling feasible edge deployment of large quantized models.
Excerpt
r/LocalLLaMA · 103 points · 50 comments · dl.acm.org
Read at source: https://dl.acm.org/doi/pdf/10.1145/3748173.3779189