> BROMANDER_LABS
WHO'SACTUALLY FAST?
Time-to-first-token and sustained throughput across 27 provider+model combinations — from Cerebras' wafer-scale silicon to OpenAI's reasoning-mode marathons.
Verified 2026-05-25Sourced from Artificial Analysis + benchmarks
Fastest TTFT
SambaNova
Llama 3.3 70B
250 ms
Highest Throughput
Cerebras
Llama 4 Scout
2,600 t/s
Filter
Sort
Provider · ModelTTFT (lower = snappier)Throughput (higher = faster)
Cerebras·Llama 4 Scout
Wafer-scale silicon. The current throughput champion.
500 ms
2,600 t/s
250 ms
580 t/s
Groq·Llama 3.3 70B
LPU compiler pre-computes execution graph — near-zero variance.
910 ms
340 t/s
xAI·Grok 4.20 (reasoning)reasoning
TTFT inflated by reasoning pre-tokens.
10.3 sincl. thinking
233 t/s
OpenAI·GPT-5 (high reasoning)reasoning
TTFT includes ~60s of hidden chain-of-thought before the first user-visible token.
65.9 sincl. thinking
105 t/s
DeepSeek·DeepSeek V4 Flash
High variance — P95 TTFT can spike past 2.5s.
800 ms
50 t/s
A note on methodology. Numbers are medians from Artificial Analysis and provider-published benchmarks at the date above. Your real-world latency varies with region, prompt size, output length, and load. Reasoning models (flagged with a brain icon) show inflated TTFT because the timer starts before hidden chain-of-thought and only stops at the first user-visible token.
Share the leaderboard
Optimizing your inference stack?
Talk to Bromander Studios