Cerebras Delivers 3,000 Tokens/Second Inference for OpenAI's gpt-oss-120B Open-Weight Model

Cerebras announced support for OpenAI's first open-weight reasoning model (gpt-oss-120B) on its inference service, running at ~3,000 tokens per second — claimed to be ~55x faster and ~60x cheaper than Anthropic's Claude 4 Opus.

Evidence Strength

Evidence

100%Authoritative

Backed by official company doc

Reported by 2 independent publishers

Includes official or primary source

Key Development

High-significance development (rated 8/10)

Covered by 4 sources

Confirmed — verified event

Insights

First tracked

August 5, 2025

Last updated

November 6, 2025

Sources

4 sources

Related Developments

OpenAI Signs $10B+ Multiyear Compute Deal with Cerebras OpenAI GPT-5.3-Codex-Spark Powered by Cerebras Launches in Research Preview GPT-5.3-Codex-Spark Ships as First Model Running on Cerebras WSE-3 for OpenAI Oklahoma City AI Datacenter Ribbon-Cutting with 44+ Exaflops CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks Published

Sources (4)

Source Timeline

OpenAI GPT-OSS 120B Benchmarked – NVIDIA Blackwell vs. Cerebras
Cerebras·Nov 6, 2025
OpenAI GPT OSS 120B Runs Fastest on Cerebras
Cerebras·Aug 6, 2025
Cerebras delivers blazing speed for OpenAI’s new open-model with 3,000 tokens per second
SiliconANGLE·Aug 5, 2025
Cerebras Launches OpenAI’s gpt-oss-120B at a Blistering 3,000 tokens/sec
Cerebras·Aug 5, 2025

View all Cerebras Systems developments