Cerebras Sets Llama 4 Maverick Inference Speed Record at 2,500+ Tokens/Sec

Cerebras achieved a world record of 2,500+ tokens/sec per user on Meta's 400B parameter Llama 4 Maverick model as measured by Artificial Analysis, more than doubling Nvidia's DGX B200 benchmark of 1,000 tokens/sec.

Evidence Strength

Evidence

96%Authoritative

Backed by official company doc

Single publisher source

Includes official or primary source

Key Development

High-significance development (rated 8/10)

Confirmed — verified event

Insights

First tracked

June 2, 2025

Last updated

June 2, 2025

Sources

1 source

Related Developments

CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks Published Cerebras Delivers 3,000 Tokens/Second Inference for OpenAI's gpt-oss-120B Open-Weight Model Jais 2 Arabic-Centric LLMs Trained and Deployed on Cerebras Wafer-Scale Clusters GLM-4.7 Available on Cerebras Inference Cloud at 1,000-1,700 Tokens/Second OpenAI Signs $10B+ Multiyear Compute Deal with Cerebras

Sources (1)

Source Timeline

Cerebras May 2025 Newsletter
Cerebras·Jun 2, 2025

View all Cerebras Systems developments

Evidence Strength

Evidence

96%Authoritative

Backed by official company doc

Single publisher source

Includes official or primary source

Key Development

High-significance development (rated 8/10)

Confirmed — verified event

Insights

First tracked

June 2, 2025

Last updated

June 2, 2025

Sources

1 source