GLM-4.7 Available on Cerebras Inference Cloud at 1,000-1,700 Tokens/Second

Z.ai's GLM-4.7, the top open-weight coding model surpassing DeepSeek-V3.2 on key benchmarks, launched on Cerebras Inference Cloud running at up to 1,700 tokens/second, which Cerebras states is 20x faster than closed-source competitors on GPUs.

Evidence Strength

Evidence

96%Authoritative

Backed by official company doc

Single publisher source

Includes official or primary source

Insights

First tracked

November 18, 2025

Last updated

January 8, 2026

Sources

3 sources

Related Developments

Oklahoma City AI Datacenter Ribbon-Cutting with 44+ Exaflops Cerebras Delivers 3,000 Tokens/Second Inference for OpenAI's gpt-oss-120B Open-Weight Model CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks Published Jais 2 Arabic-Centric LLMs Trained and Deployed on Cerebras Wafer-Scale Clusters OpenAI Signs $10B+ Multiyear Compute Deal with Cerebras

Sources (3)

Source Timeline

This new model is smarter than Sonnet 4.5…and 20X faster?
Cerebras·Jan 8
GLM-4.7: Frontier intelligence at record speed — now available on Cerebras
Cerebras·Jan 8
The world’s fastest GLM-4.6 – now available on Cerebras
Cerebras·Nov 18, 2025

View all Cerebras Systems developments