Needle: We Distilled Gemini Tool Calling Into a 26M Model
Needle, a 26M parameter function-calling model distilled from Gemini, runs at 6000 tok/s prefill on consumer devices using a Simple Attention Network architecture with no MLPs.
Excerpt
We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.
We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.
Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).
Training:
\- Pretrained on 200B tokens across 16 TPU v6e (27 hours)
\- Post-trained on 2B tokens of synthesized function-calling data (45 minutes)
\- Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)
You can test it right now and finetune on your Mac/PC: [https://github.com/cactus-compute/needle](https://github.com/cactus-compute/needle)
The full writeup on the architecture is here: [https://github.com/cactus-compute/needle/blob/main/docs/simple\_attention\_networks.md](https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md)
We found that the "no FFN" finding
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1tb9b0r/needle_we_distilled_gemini_tool_calling_into_a/
Discussions
- reddit · 131 points · 26 comments
- reddit · 143 points · 26 comments
- reddit · 164 points · 28 comments
- reddit · 174 points · 29 comments
- reddit · 192 points · 29 comments
- reddit · 200 points · 29 comments
- reddit · 203 points · 31 comments
- reddit · 210 points · 31 comments
- reddit · 216 points · 31 comments
- reddit · 223 points · 30 comments
- reddit · 229 points · 32 comments
- reddit · 234 points · 32 comments
- reddit · 248 points · 33 comments
- reddit · 257 points · 33 comments
- reddit · 262 points · 33 comments
- reddit · 266 points · 34 comments
- reddit · 269 points · 34 comments
- reddit · 279 points · 35 comments
- reddit · 288 points · 35 comments
- reddit · 291 points · 39 comments
- reddit · 300 points · 40 comments
- reddit · 310 points · 40 comments
- reddit · 313 points · 41 comments
- reddit · 320 points · 41 comments
- reddit · 321 points · 42 comments
- reddit · 334 points · 45 comments
- reddit · 333 points · 45 comments
- reddit · 333 points · 45 comments
- reddit · 336 points · 45 comments
- reddit · 334 points · 45 comments