Qwen3-32B Reasoning Model Live on Cerebras at 2,400 Tokens/Sec

Cerebras deployed Alibaba's Qwen3-32B reasoning model at 2,400 tokens/sec output speed (40x faster than best GPU result) with 1.2-second time to first token, marking the first reasoning model running in real time on any hardware.

Evidence Strength

Evidence

96%Authoritative

Backed by official company doc

Single publisher source

Includes official or primary source

Key Development

High-significance development (rated 8/10)

Confirmed — verified event

Insights

First tracked

May 15, 2025

Last updated

June 2, 2025

Sources

2 sources

Related Developments

Oklahoma City AI Datacenter Ribbon-Cutting with 44+ Exaflops REAP: One-Shot MoE Pruning Method Open-Sourced Meta Llama Prompt-Ops and Synthetic-Data-Kit Integration with Cerebras Inference Cerebras Inference Pay-Per-Token Tier Launched CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks Published

Sources (2)

Source Timeline

Cerebras May 2025 Newsletter
Cerebras·Jun 2, 2025
Realtime Reasoning is Here - Qwen3-32B is Live on Cerebras
Cerebras·May 15, 2025

View all Cerebras Systems developments