[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost
Paper presents Hummingbird+, a low-cost FPGA accelerator achieving 18 tok/s running Qwen3-30B-A3B Q4 with 24GB memory at an expected $150 mass production price, enabling feasible edge deployment of large quantized models.
Excerpt
r/LocalLLaMA · 103 points · 50 comments · dl.acm.org
Read at source: https://dl.acm.org/doi/pdf/10.1145/3748173.3779189
Discussions
- reddit · 103 points · 50 comments
- reddit · 106 points · 50 comments
- reddit · 104 points · 50 comments
- reddit · 105 points · 50 comments
- reddit · 115 points · 50 comments
- reddit · 112 points · 50 comments
- reddit · 115 points · 50 comments
- reddit · 118 points · 50 comments
- reddit · 121 points · 50 comments
- reddit · 125 points · 50 comments
- reddit · 126 points · 50 comments
- reddit · 127 points · 50 comments
- reddit · 133 points · 50 comments