Evidence96%Authoritative
FactConfirmedProduct·June 2, 2025
Cerebras Sets Llama 4 Maverick Inference Speed Record at 2,500+ Tokens/Sec
Cerebras achieved a world record of 2,500+ tokens/sec per user on Meta's 400B parameter Llama 4 Maverick model as measured by Artificial Analysis, more than doubling Nvidia's DGX B200 benchmark of 1,000 tokens/sec.
Evidence Strength
Evidence96%Authoritative
Backed by official company doc
Single publisher source
Includes official or primary source
Key Development
High-significance development (rated 8/10)
Confirmed — verified event
Insights
First tracked
June 2, 2025
Last updated
June 2, 2025
Sources
1 source
Related Developments
CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks PublishedCerebras Delivers 3,000 Tokens/Second Inference for OpenAI's gpt-oss-120B Open-Weight ModelJais 2 Arabic-Centric LLMs Trained and Deployed on Cerebras Wafer-Scale ClustersGLM-4.7 Available on Cerebras Inference Cloud at 1,000-1,700 Tokens/SecondOpenAI Signs $10B+ Multiyear Compute Deal with Cerebras
Sources (1)
Source Timeline
Cerebras May 2025 NewsletterCerebras·Jun 2, 2025
Evidence Strength
Evidence96%Authoritative
Backed by official company doc
Single publisher source
Includes official or primary source
Key Development
High-significance development (rated 8/10)
Confirmed — verified event
Insights
First tracked
June 2, 2025
Last updated
June 2, 2025
Sources
1 source
Related Developments
CS-3 vs. NVIDIA DGX B200 Blackwell Benchmarks PublishedCerebras Delivers 3,000 Tokens/Second Inference for OpenAI's gpt-oss-120B Open-Weight ModelJais 2 Arabic-Centric LLMs Trained and Deployed on Cerebras Wafer-Scale ClustersGLM-4.7 Available on Cerebras Inference Cloud at 1,000-1,700 Tokens/SecondOpenAI Signs $10B+ Multiyear Compute Deal with Cerebras