GLM-5.2 is probably the most powerful text-only open weights LLM

Simon Willison ·

Z.ai released GLM-5.2 open weights under MIT, a 753B-parameter MoE text model with a 1 million token context window.

Categories: Model Releases, OSS & Tools

Excerpt

<p>Chinese AI lab <a href="https://z.ai/">Z.ai</a> released GLM-5.2 <a href="https://x.com/Zai_org/status/2065704919299235870">to their coding plan subscribers</a> on June 13th, and then yesterday (June 16th) released the full open weights under an MIT license. Similar in size to their previous GLM-5 and GLM-5.1 releases, this is 753B parameter, <a href="https://huggingface.co/zai-org/GLM-5.2">1.51TB</a> monster - with 40 active parameters (Mixture of Experts). GLM-5.2 is a text input only model - Z.ai have a separate vision family most recently represented by <a href="https://x.com/Zai_org/status/2039371126984360085">GLM-5V-Turbo</a>, but that one isn't open weights. GLM-5.2 has a 1 million token context window, up from GLM-5.1's 200,000.</p> <p>The buzz around this model is strong.</p> <p>Artificial Analysis, who run one of the most widely respected independent benchmarks: <a href="https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artificial-analysis-intelligence-index">GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index</a>.</p> <blockquote> <p><strong>GLM-5.2 is the leading open weights model on the Intelligence Index v4.1.</strong> At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43)</p> </blockquote> <p>They did however find it to be quite token-hungry:</p> <blockquote> <p><strong>GLM-5.2 uses more output tokens per task than other leading open weights models:</strong> the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k)</p> </blockquote> <p>The model is also now ranked 2nd on the <a href="https://arena.ai/leaderboard/code/webdev">Code Arena WebDev leaderboard</a>, behind only Claude Fable 5. That leaderboard measures "front-end web development tasks, including agentic coding workflows". I'm impressed to see it rank so highly given the lack of image i