> BROMANDER_LABS

WHO'SACTUALLY FAST?

Time-to-first-token and sustained throughput across 27 provider+model combinations — from Cerebras' wafer-scale silicon to OpenAI's reasoning-mode marathons.

Verified 2026-05-25Sourced from Artificial Analysis + benchmarks

Fastest TTFT

SambaNova

Llama 3.3 70B

250 ms

Highest Throughput

Cerebras

Llama 4 Scout

2,600 t/s

Filter

Sort

Provider · ModelTTFT (lower = snappier)Throughput (higher = faster)

Cerebras·Llama 4 Scout

Specialized silicon· USsource

Wafer-scale silicon. The current throughput champion.

500 ms

2,600 t/s

Cerebras·Llama 3.3 70B

Specialized silicon· USsource

400 ms

2,500 t/s

Cerebras·Llama 3.1 8B

Specialized silicon· USsource

350 ms

1,800 t/s

Groq·Llama 3.1 8B

Specialized silicon· Globalsource

400 ms

750 t/s

SambaNova·Llama 3.3 70B

Specialized silicon· USsource

Best-in-class TTFT among major providers.

250 ms

580 t/s

Groq·Llama 4 Scout

Specialized silicon· Globalsource

350 ms

510 t/s

Groq·Llama 3.3 70B

Specialized silicon· Globalsource

LPU compiler pre-computes execution graph — near-zero variance.

910 ms

340 t/s

Google·Gemini 3.5 Flash

Frontier proprietary· Globalsource

Fastest Gemini Flash to date.

500 ms

284 t/s

Google·Gemini 2.5 Flash

Frontier proprietary· Globalsource

400 ms

250 t/s

xAI·Grok 4.20 (reasoning)reasoning

Frontier proprietary· USsource

TTFT inflated by reasoning pre-tokens.

10.3 sincl. thinking

233 t/s

xAI·Grok 4.3

Frontier proprietary· USsource

1.30 s

209 t/s

SambaNova·DeepSeek R1

Specialized silicon· USsource

280 ms

198 t/s

Google Vertex·Llama 3.3 70B

Open-model host· Globalsource

650 ms

159 t/s

Fireworks·Llama 3.3 70B

Open-model host· USsource

450 ms

150 t/s

Amazon Bedrock·Llama 3.3 70B

Open-model host· Globalsource

700 ms

135 t/s

OpenAI·GPT-4o

Frontier proprietary· Globalsource

550 ms

131 t/s

Anthropic·Claude Haiku 4.5

Frontier proprietary· Globalsource

800 ms

110 t/s

OpenAI·GPT-5 (chat)

Frontier proprietary· Globalsource

1.50 s

110 t/s

Google·Gemini 3.1 Pro

Frontier proprietary· Globalsource

900 ms

109 t/s

OpenAI·GPT-5 (high reasoning)reasoning

Frontier proprietary· Globalsource

TTFT includes ~60s of hidden chain-of-thought before the first user-visible token.

65.9 sincl. thinking

105 t/s

Anthropic·Claude Sonnet 4.6

Frontier proprietary· Globalsource

1.00 s

90 t/s

DeepInfra·Llama 3.3 70B

Open-model host· USsource

550 ms

85 t/s

OpenAI·GPT-4.1

Frontier proprietary· Globalsource

550 ms

85 t/s

Together·Llama 3.3 70B

Open-model host· USsource

800 ms

80 t/s

Anthropic·Claude Opus 4.7

Frontier proprietary· Globalsource

1.20 s

60 t/s

DeepSeek·DeepSeek V4 Flash

Frontier proprietary· CN/Globalsource

High variance — P95 TTFT can spike past 2.5s.

800 ms

50 t/s

Mistral·Mistral Large 3

Frontier proprietary· EUsource

1.04 s

48 t/s

A note on methodology. Numbers are medians from Artificial Analysis and provider-published benchmarks at the date above. Your real-world latency varies with region, prompt size, output length, and load. Reasoning models (flagged with a brain icon) show inflated TTFT because the timer starts before hidden chain-of-thought and only stops at the first user-visible token.

Share the leaderboard

Optimizing your inference stack?

Talk to Bromander Studios

> BROMANDER_LABS

WHO'SACTUALLY FAST?

Time-to-first-token and sustained throughput across 27 provider+model combinations — from Cerebras' wafer-scale silicon to OpenAI's reasoning-mode marathons.

Verified 2026-05-25Sourced from Artificial Analysis + benchmarks

Fastest TTFT

SambaNova

Llama 3.3 70B

250 ms

Highest Throughput

Cerebras

Llama 4 Scout

2,600 t/s

Filter

Sort

Provider · ModelTTFT (lower = snappier)Throughput (higher = faster)

Cerebras·Llama 4 Scout

Specialized silicon· USsource

Wafer-scale silicon. The current throughput champion.

500 ms

2,600 t/s

Cerebras·Llama 3.3 70B

Specialized silicon· USsource

400 ms

2,500 t/s

Cerebras·Llama 3.1 8B

Specialized silicon· USsource

350 ms

1,800 t/s

Groq·Llama 3.1 8B

Specialized silicon· Globalsource

400 ms

750 t/s

SambaNova·Llama 3.3 70B

Specialized silicon· USsource

Best-in-class TTFT among major providers.

250 ms

580 t/s

Groq·Llama 4 Scout

Specialized silicon· Globalsource

350 ms

510 t/s

Groq·Llama 3.3 70B

Specialized silicon· Globalsource

LPU compiler pre-computes execution graph — near-zero variance.

910 ms

340 t/s

Google·Gemini 3.5 Flash

Frontier proprietary· Globalsource

Fastest Gemini Flash to date.

500 ms

284 t/s

Google·Gemini 2.5 Flash

Frontier proprietary· Globalsource

400 ms

250 t/s

xAI·Grok 4.20 (reasoning)reasoning

Frontier proprietary· USsource

TTFT inflated by reasoning pre-tokens.

10.3 sincl. thinking

233 t/s

xAI·Grok 4.3

Frontier proprietary· USsource

1.30 s

209 t/s

SambaNova·DeepSeek R1

Specialized silicon· USsource

280 ms

198 t/s

Google Vertex·Llama 3.3 70B

Open-model host· Globalsource

650 ms

159 t/s

Fireworks·Llama 3.3 70B

Open-model host· USsource

450 ms

150 t/s

Amazon Bedrock·Llama 3.3 70B

Open-model host· Globalsource

700 ms

135 t/s

OpenAI·GPT-4o

Frontier proprietary· Globalsource

550 ms

131 t/s

Anthropic·Claude Haiku 4.5

Frontier proprietary· Globalsource

800 ms

110 t/s

OpenAI·GPT-5 (chat)

Frontier proprietary· Globalsource

1.50 s

110 t/s

Google·Gemini 3.1 Pro

Frontier proprietary· Globalsource

900 ms

109 t/s

OpenAI·GPT-5 (high reasoning)reasoning

Frontier proprietary· Globalsource

TTFT includes ~60s of hidden chain-of-thought before the first user-visible token.

65.9 sincl. thinking

105 t/s

Anthropic·Claude Sonnet 4.6

Frontier proprietary· Globalsource

1.00 s

90 t/s

DeepInfra·Llama 3.3 70B

Open-model host· USsource

550 ms

85 t/s

OpenAI·GPT-4.1

Frontier proprietary· Globalsource

550 ms

85 t/s

Together·Llama 3.3 70B

Open-model host· USsource

800 ms

80 t/s

Anthropic·Claude Opus 4.7

Frontier proprietary· Globalsource

1.20 s

60 t/s

DeepSeek·DeepSeek V4 Flash

Frontier proprietary· CN/Globalsource

High variance — P95 TTFT can spike past 2.5s.

800 ms

50 t/s

Mistral·Mistral Large 3

Frontier proprietary· EUsource

1.04 s

48 t/s

Share the leaderboard

Optimizing your inference stack?

Talk to Bromander Studios