[NEW] Supra-50M Released!
SupraLabs released open 50M-parameter base and instruct language models trained from scratch on educational web text.
Excerpt
https://preview.redd.it/kx39ammxno2h1.jpg?width=1080&format=pjpg&auto=webp&s=d1a2d5b27920a5b61a50547a6e70a6378445cae4
# SupraLabs released a new model! - Supra-50M
**Supra-50M** is a compact 50M-parameter causal language model (BASE and INSTRUCT versions) built from scratch by SupraLabs using a Llama-style architecture, trained on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. This is our first **SupraLabs Scaling Up Plan** model.
🤗 [Supra-50M-Base](https://huggingface.co/SupraLabs/Supra-50M-Base) | [Supra-50M-Instruct](https://huggingface.co/SupraLabs/Supra-50M-Instruct)
# What comes next?
* **Supra-124M** — Base, Chat, Experimental Reasoning
* **Supra-350M** — Base, Chat, Reasoning, Coding
# 🏆 Benchmarks
|Benchmark|Supra-50M *(ours)*|GPT-2 (124M)|SmolLM-135M|OpenELM-270M|
|:-|:-|:-|:-|:-|
|**Parameters**|**50M**|124M *(2.5×)*|135M *(2.7×)*|270M *(5.4×)*|
|**BLiMP** (linguistics)|**76.3%**|63.0%|69.8%|N/A|
|**SciQ** (science)|77.2%|53.2%|73.4%|**84.70%**|
|**ARC-Easy** (knowledge)|52.2%|42.0%|49.2%|**45.08%**|
|**PIQA** (logic)|62.2%|63.0%|67.3%|**69.75%**|
|**HellaSwag** (context)|31.8%|29.5%|42.0%|**46.71%**|
# 🧠 Architecture & Hyperparameters
|Hyperparameter|Value|
|:-|:-|
|Architecture|Llama (decoder-only transformer)|
|Parameters|\~50M|
|Vocab size|32,000|
|Hidden size|512|
|Intermediate size|1,408|
|Hi
Read at source: https://www.reddit.com/r/LocalLLaMA/comments/1tkhngq/new_supra50m_released/