CBRSPrivate
Evidence
96%Authoritative
FactConfirmedProduct·January 8, 2026

GLM-4.7 Available on Cerebras Inference Cloud at 1,000-1,700 Tokens/Second

Z.ai's GLM-4.7, the top open-weight coding model surpassing DeepSeek-V3.2 on key benchmarks, launched on Cerebras Inference Cloud running at up to 1,700 tokens/second, which Cerebras states is 20x faster than closed-source competitors on GPUs.

Evidence Strength

Evidence
96%Authoritative
Backed by official company doc
Single publisher source
Includes official or primary source

Insights

First tracked

November 18, 2025

Last updated

January 8, 2026

Sources

3 sources