Cloudflare open-sources lossless LLM compression tool
Cloudflare open-sourced Unweight, a lossless compression system for LLMs that achieves 15-22% model size reduction and saves ~3GB VRAM on Llama-3.1-8B/H100, with GPU kernels on GitHub and a technical paper.
Excerpt
* Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15–22% without sacrificing output accuracy.
* On Meta's Llama-3.1-8B, the tool saves roughly 3 GB of VRAM by compressing MLP weights on Nvidia H100 GPUs.
* Cloudflare open-sourced the GPU kernels on GitHub and published a technical paper, with plans to extend compression to attention weights.
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1sor438/cloudflare_opensources_lossless_llm_compression/
Discussions
- reddit · 100 points · 8 comments
- reddit · 104 points · 8 comments
- reddit · 106 points · 8 comments
- reddit · 114 points · 12 comments
- reddit · 128 points · 17 comments
- reddit · 129 points · 17 comments
- reddit · 133 points · 17 comments
- reddit · 132 points · 17 comments
- reddit · 138 points · 18 comments
- reddit · 144 points · 18 comments
- reddit · 146 points · 18 comments
- reddit · 154 points · 18 comments
- reddit · 157 points · 18 comments
- reddit · 159 points · 18 comments
- reddit · 163 points · 18 comments
- reddit · 170 points · 18 comments
- reddit · 173 points · 18 comments
- reddit · 174 points · 18 comments