Evidence96%Authoritative
FactConfirmedProduct·June 2, 2025
Qwen3-32B Reasoning Model Live on Cerebras at 2,400 Tokens/Sec
Cerebras deployed Alibaba's Qwen3-32B reasoning model at 2,400 tokens/sec output speed (40x faster than best GPU result) with 1.2-second time to first token, marking the first reasoning model running in real time on any hardware.
Evidence Strength
Evidence96%Authoritative
Backed by official company doc
Single publisher source
Includes official or primary source
Key Development
High-significance development (rated 8/10)
Confirmed — verified event
Insights
First tracked
May 15, 2025
Last updated
June 2, 2025
Sources
2 sources
Related Developments
Sources (2)
Source Timeline
Cerebras May 2025 NewsletterCerebras·Jun 2, 2025
Realtime Reasoning is Here - Qwen3-32B is Live on CerebrasCerebras·May 15, 2025
Evidence Strength
Evidence96%Authoritative
Backed by official company doc
Single publisher source
Includes official or primary source
Key Development
High-significance development (rated 8/10)
Confirmed — verified event
Insights
First tracked
May 15, 2025
Last updated
June 2, 2025
Sources
2 sources