B

BROMANDER

STUDIOS

HomeServicesProcess
LabsBlogContact
> BROMANDER_LABS

WHO'SACTUALLY FAST?

Time-to-first-token and sustained throughput across 27 provider+model combinations — from Cerebras' wafer-scale silicon to OpenAI's reasoning-mode marathons.

Verified 2026-05-25Sourced from Artificial Analysis + benchmarks
Fastest TTFT
SambaNova
Llama 3.3 70B
250 ms
Highest Throughput
Cerebras
Llama 4 Scout
2,600 t/s
Filter
Sort
Provider · ModelTTFT (lower = snappier)Throughput (higher = faster)
Cerebras·Llama 4 Scout
Specialized silicon· USsource

Wafer-scale silicon. The current throughput champion.

500 ms
2,600 t/s
Cerebras·Llama 3.3 70B
Specialized silicon· USsource
400 ms
2,500 t/s
Cerebras·Llama 3.1 8B
Specialized silicon· USsource
350 ms
1,800 t/s
Groq·Llama 3.1 8B
Specialized silicon· Globalsource
400 ms
750 t/s
SambaNova·Llama 3.3 70B
Specialized silicon· USsource

Best-in-class TTFT among major providers.

250 ms
580 t/s
Groq·Llama 4 Scout
Specialized silicon· Globalsource
350 ms
510 t/s
Groq·Llama 3.3 70B
Specialized silicon· Globalsource

LPU compiler pre-computes execution graph — near-zero variance.

910 ms
340 t/s
Google·Gemini 3.5 Flash
Frontier proprietary· Globalsource

Fastest Gemini Flash to date.

500 ms
284 t/s
Google·Gemini 2.5 Flash
Frontier proprietary· Globalsource
400 ms
250 t/s
xAI·Grok 4.20 (reasoning)reasoning
Frontier proprietary· USsource

TTFT inflated by reasoning pre-tokens.

10.3 sincl. thinking
233 t/s
xAI·Grok 4.3
Frontier proprietary· USsource
1.30 s
209 t/s
SambaNova·DeepSeek R1
Specialized silicon· USsource
280 ms
198 t/s
Google Vertex·Llama 3.3 70B
Open-model host· Globalsource
650 ms
159 t/s
Fireworks·Llama 3.3 70B
Open-model host· USsource
450 ms
150 t/s
Amazon Bedrock·Llama 3.3 70B
Open-model host· Globalsource
700 ms
135 t/s
OpenAI·GPT-4o
Frontier proprietary· Globalsource
550 ms
131 t/s
Anthropic·Claude Haiku 4.5
Frontier proprietary· Globalsource
800 ms
110 t/s
OpenAI·GPT-5 (chat)
Frontier proprietary· Globalsource
1.50 s
110 t/s
Google·Gemini 3.1 Pro
Frontier proprietary· Globalsource
900 ms
109 t/s
OpenAI·GPT-5 (high reasoning)reasoning
Frontier proprietary· Globalsource

TTFT includes ~60s of hidden chain-of-thought before the first user-visible token.

65.9 sincl. thinking
105 t/s
Anthropic·Claude Sonnet 4.6
Frontier proprietary· Globalsource
1.00 s
90 t/s
DeepInfra·Llama 3.3 70B
Open-model host· USsource
550 ms
85 t/s
OpenAI·GPT-4.1
Frontier proprietary· Globalsource
550 ms
85 t/s
Together·Llama 3.3 70B
Open-model host· USsource
800 ms
80 t/s
Anthropic·Claude Opus 4.7
Frontier proprietary· Globalsource
1.20 s
60 t/s
DeepSeek·DeepSeek V4 Flash
Frontier proprietary· CN/Globalsource

High variance — P95 TTFT can spike past 2.5s.

800 ms
50 t/s
Mistral·Mistral Large 3
Frontier proprietary· EUsource
1.04 s
48 t/s
A note on methodology. Numbers are medians from Artificial Analysis and provider-published benchmarks at the date above. Your real-world latency varies with region, prompt size, output length, and load. Reasoning models (flagged with a brain icon) show inflated TTFT because the timer starts before hidden chain-of-thought and only stops at the first user-visible token.
Share the leaderboard
Optimizing your inference stack?
Talk to Bromander Studios
BROMANDER STUDIOS
BROMANDER STUDIOS

BROMANDERSTUDIOS

Pioneering digital excellence through innovative solutions and cutting-edge technology.

START A PROJECT

COMPANY

  • About
  • Blog
  • Contact
  • Careers

CONTACT

  • hello@bromanderstudios.se
  • +46 70 123 45 67
  • Stockholm, Sweden

© 2026 Bromander Studios. All rights reserved.

Privacy PolicyTerms of ServiceCookie Policy