
Z.ai ships GLM-5.2: 744B MoE, 1M-token context, drops into Claude Code via Anthropic endpoint — with no benchmarks
Chris Harper
3 min read
Jun 16, 2026 · 12:14 UTC
Z.ai (Zhipu AI) shipped GLM-5.2 on June 13 directly to GLM Coding Plan subscribers — a 744-billion-parameter Mixture-of-Experts model with 40 billion active parameters per token and a genuine 1-million-token context window (roughly 5× the 200K cap on GLM-5.1). MIT-licensed open weights and a public API are both coming "next week."
Context window: actually usable. A 1M advertised context and a usable 1M context are often different things — most models degrade severely past 200K. Z.ai's documentation distinguishes between the standard context path and a glm-5.2[1m] model suffix that routes to the full 1M window, suggesting the larger context is a deliberate, tested configuration rather than a marketing ceiling. For teams working on tasks that benefit from loading an entire large codebase or a long document corpus in one pass, that distinction matters.
Two thinking-effort levels. GLM-5.2 ships with High and Max thinking-effort modes — similar in concept to Anthropic's extended thinking and OpenAI's reasoning_effort parameter. No quantitative data on what each costs or how they perform yet.
Drop-in for Claude Code and Cline. Z.ai exposes both an OpenAI Chat Completions endpoint and an Anthropic Messages API-compatible endpoint, meaning Claude Code, Cline, and any tool that calls the Anthropic SDK can swap in GLM-5.2 with a base-URL change and model name swap. For teams evaluating model swaps on existing Claude-based agent workflows, this is the lowest-friction path to comparison testing.
Pricing. GLM Coding Plan tiers: Lite at $10/month, Pro at $30/month, Max at $80/month. Quota is plan-bucketed (Lite: ~80 prompts/5 hr; Pro: ~400; Max: ~1600), not token-billed at the API-call level. Token-based API pricing for the open API has not yet been announced.
The notable absence: no benchmarks. Z.ai published zero benchmark numbers at launch — no SWE-Bench, Terminal-Bench, or Code Arena score. MarkTechPost notes this explicitly. The GLM-5.2 vs Kimi K2.7-Code comparison notes early real-task results look strong, but no independent head-to-head numbers exist yet. This matters: the model has the right specs and the right interface, but whether it delivers on those specs for real coding tasks is still unknown. Benchmark before deploying.
Practical move this week. If your team is evaluating open-weight or low-cost alternatives to the Anthropic API (especially with Fable 5 offline and Opus 4.8 being the ceiling right now), GLM-5.2's Anthropic-compatible endpoint makes it a one-hour comparison test. Use your own real tasks, not published benchmarks you can't verify.
Sources: MarkTechPost: Z.ai launches GLM-5.2, Nerova.ai: GLM-5.2 endpoint setup guide, AIMadeTools: GLM-5.2 complete guide, AIMadeTools: GLM-5.2 vs Kimi K2.7, DigitalApplied: GLM-5.2 on Coding Plan