NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing
NVIDIA released Star Elastic, a single checkpoint containing 30B/23B/12B reasoning models with zero-shot slicing, enabling dynamic switching between sizes like scalable video coding with shared KV cache.
Excerpt
I saw this on another sub and didn't see it posted here, it looks awesome, and can definitely be run local. I guess it was released 11 days ago, but it never hit the top of my feed (which I look at way too often), so posting it again.
# This is my take on it:
Think of this as like scalable video coding, you have a UHD stream, but strip some layers and you have a HD, or SD stream, it's all a single file stream, not multiple ones.
Like nested models, rather than 3 different sets, and they can share their KV cache so the model can adjust speed like a sliding scale. You get an idea with a 30B model, then scale down and permutate all the thinking at 7000t/s on the 12B model, generating a book of reasoning in seconds, then slide up to 30B again to evaluate what's good. You could have a 30B kind of guide the smaller ones back and forth.
Maybe it's somewhat of a hybrid between Dense and MoE, it's like MoE but with 3 dense models that are like russian dolls.
# Original Post:
NVIDIA just released Star Elastic — and the inference strategy alone is worth understanding.
Here's what's actually interesting from the technical side:
1. One checkpoint. Three models.
Star Elastic applies a post-training method to Nemotron Nano v3 that nests 23B and 12B submodels can be extracted zero-shot from the parent checkpoint the 30B parent. All three live in a single checkpoint in BF16, FP8, and NVFP4.
2. The router learns the architecture, not just the weights.
A learnable router trained
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1t8s83r/nvidia_ai_releases_star_elastic_one_checkpoint/