Cloudflare open-sources lossless LLM compression tool

· r/LocalLLaMA ·

Cloudflare open-sourced Unweight, a lossless compression system for LLMs that achieves 15-22% model size reduction and saves ~3GB VRAM on Llama-3.1-8B/H100, with GPU kernels on GitHub and a technical paper.

Categories: OSS & Tools, Research

Excerpt

* Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15–22% without sacrificing output accuracy. * On Meta's Llama-3.1-8B, the tool saves roughly 3 GB of VRAM by compressing MLP weights on Nvidia H100 GPUs. * Cloudflare open-sourced the GPU kernels on GitHub and published a technical paper, with plans to extend compression to attention weights.

Discussions