[NEW] Supra-50M Released!

· r/LocalLLaMA ·

SupraLabs released open 50M-parameter base and instruct language models trained from scratch on educational web text.

Categories: Model Releases, OSS & Tools

Excerpt

https://preview.redd.it/kx39ammxno2h1.jpg?width=1080&format=pjpg&auto=webp&s=d1a2d5b27920a5b61a50547a6e70a6378445cae4 # SupraLabs released a new model! - Supra-50M **Supra-50M** is a compact 50M-parameter causal language model (BASE and INSTRUCT versions) built from scratch by SupraLabs using a Llama-style architecture, trained on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. This is our first **SupraLabs Scaling Up Plan** model. 🤗 [Supra-50M-Base](https://huggingface.co/SupraLabs/Supra-50M-Base) | [Supra-50M-Instruct](https://huggingface.co/SupraLabs/Supra-50M-Instruct) # What comes next? * **Supra-124M** — Base, Chat, Experimental Reasoning * **Supra-350M** — Base, Chat, Reasoning, Coding # 🏆 Benchmarks |Benchmark|Supra-50M *(ours)*|GPT-2 (124M)|SmolLM-135M|OpenELM-270M| |:-|:-|:-|:-|:-| |**Parameters**|**50M**|124M *(2.5×)*|135M *(2.7×)*|270M *(5.4×)*| |**BLiMP** (linguistics)|**76.3%**|63.0%|69.8%|N/A| |**SciQ** (science)|77.2%|53.2%|73.4%|**84.70%**| |**ARC-Easy** (knowledge)|52.2%|42.0%|49.2%|**45.08%**| |**PIQA** (logic)|62.2%|63.0%|67.3%|**69.75%**| |**HellaSwag** (context)|31.8%|29.5%|42.0%|**46.71%**| # 🧠 Architecture & Hyperparameters |Hyperparameter|Value| |:-|:-| |Architecture|Llama (decoder-only transformer)| |Parameters|\~50M| |Vocab size|32,000| |Hidden size|512| |Intermediate size|1,408| |Hi

Discussions