BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
BeeLlama v0.2.0 adds a major DFlash update, speeding Qwen and Gemma inference on single-GPU consumer hardware.
Excerpt
**BeeLlama v0.2.0 is here!**
>Not quite a pegasus, but close enough.
[**GitHub**](https://github.com/Anbeeld/beellama.cpp) **|** [**Qwen 3.6 27B Quick Start**](https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-qwen36-dflash.md) **|** [**Gemma 4 31B Quick Start**](https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-gemma-4-31b-dflash.md)
* Full Gemma 4 31B support with efficient DFlash implementation and vision.
* Major Qwen 3.6 27B performance update from lower DFlash overhead, cleaner prefill handling, drafter K/V projection caching, and safer CUDA execution.
* DFlash GGUFs with upstream architecture are now supported.
* Fixes to adaptive profit behavior around baseline probing.
* Reduced verifier path is stricter now, with safer fallback to full logits when grammar, sampler state, or reasoning requires it.
* Reasoning and tool-call boundaries were tightened.
* Stricter draft/target validation and better draft-model discovery.
* ...and many more improvements!
**Benchmarks**
* Setup: Windows 11, AMD Ryzen 7 5700X3D, 32 GB DDR4 RAM, RTX 3090 24 GB
* Config: same as in quick start docs, but with reasoning off for non-chat prompts
* Baseline and MTP server in comparison: llama.cpp [b9275](https://github.com/ggml-org/llama.cpp/releases/tag/b9275) CUDA 13.1 Windows prebuilt
* The full text of the benchmark prompts is in [README.md on GitHub](https://github.com/Anbeeld/beellama.cpp/blob/main/README.md#dflash-speedup)
**Qwen 3.6 27B**
Target m
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1tkpz2y/beellama_v020_major_dflash_update_single_rtx_3090/
Discussions
- reddit · 100 points · 76 comments
- reddit · 107 points · 80 comments
- reddit · 114 points · 81 comments
- reddit · 118 points · 86 comments
- reddit · 122 points · 87 comments
- reddit · 131 points · 90 comments
- reddit · 133 points · 91 comments
- reddit · 137 points · 94 comments
- reddit · 147 points · 96 comments
- reddit · 148 points · 96 comments
- reddit · 151 points · 96 comments
- reddit · 154 points · 97 comments
- reddit · 155 points · 97 comments
- reddit · 161 points · 97 comments
- reddit · 156 points · 97 comments
- reddit · 161 points · 100 comments
- reddit · 166 points · 101 comments
- reddit · 167 points · 102 comments
- reddit · 172 points · 106 comments
- reddit · 179 points · 107 comments
- reddit · 177 points · 108 comments
- reddit · 185 points · 108 comments
- reddit · 182 points · 108 comments
- reddit · 185 points · 111 comments
- reddit · 186 points · 111 comments
- reddit · 185 points · 111 comments
- reddit · 186 points · 111 comments
- reddit · 187 points · 111 comments
- reddit · 190 points · 111 comments
- reddit · 193 points · 112 comments
- reddit · 196 points · 112 comments
- reddit · 198 points · 112 comments
- reddit · 201 points · 114 comments
- reddit · 201 points · 114 comments
- reddit · 202 points · 114 comments
- reddit · 200 points · 114 comments