Flash-MoE: Running a 397B Parameter Model on a Laptop

· HN · GitHub AI ·

Flash-MoE enables running 397B parameter mixture-of-experts models on a laptop through aggressive quantization and architecture optimization.

Categories: OSS & Tools

Excerpt

HN · 398 points · 120 comments

Discussions