LLM Inference

Generative text inference — tokens produced per second per GPU. · llama2-70b · Offline + Server scenarios · MLPerf v4.1/v5.0 · on-demand pricing

Throughput — per GPU

GPUScenariotok/s per GPU
NVIDIA B200-SXM-180GBOffline12,357 tok/s
NVIDIA B200-SXM-180GBServer12,305 tok/s
NVIDIA H200-SXM-141GBOffline4,432 tok/s
NVIDIA H200-SXM-141GBServer4,134 tok/s
NVIDIA H100-SXM-80GBOffline3,913 tok/s
NVIDIA H200-NVL-141GBOffline3,894 tok/s
NVIDIA H100-SXM-80GBServer3,888 tok/s
NVIDIA H200-NVL-141GBServer3,606 tok/s

Efficiency — tok/$ · ranked

ProviderGPUScenarioPricetok/$
RunPodNVIDIA B200-SXM-180GBOffline$5.49/GPU·hr
1d ago
8,103,115
RunPodNVIDIA B200-SXM-180GBServer$5.49/GPU·hr
1d ago
8,069,123
Lambda LabsNVIDIA B200-SXM-180GBOffline$6.69/GPU·hr
1d ago
6,649,641
Lambda LabsNVIDIA B200-SXM-180GBServer$6.69/GPU·hr
1d ago
6,621,747
GCP (us-central1)NVIDIA B200-SXM-180GBOffline$8.05/GPU·hr
today
5,522,793
GCP (us-central1)NVIDIA B200-SXM-180GBServer$8.05/GPU·hr
today
5,499,626
CoreWeaveNVIDIA B200-SXM-180GBOffline$8.60/GPU·hr
1d ago
5,172,802
CoreWeaveNVIDIA B200-SXM-180GBServer$8.60/GPU·hr
1d ago
5,151,103
RunPodNVIDIA H100-SXM-80GBOffline$2.99/GPU·hr
1d ago
4,711,726
RunPodNVIDIA H100-SXM-80GBServer$2.99/GPU·hr
1d ago
4,681,547
CrusoeNVIDIA H200-SXM-141GBOffline$4.29/GPU·hr
1d ago
3,718,846
RunPodNVIDIA H200-SXM-141GBOffline$4.31/GPU·hr
1d ago
3,701,589
CrusoeNVIDIA H100-SXM-80GBOffline$3.90/GPU·hr
1d ago
3,612,323
CrusoeNVIDIA H100-SXM-80GBServer$3.90/GPU·hr
1d ago
3,589,186
Lambda LabsNVIDIA H100-SXM-80GBOffline$3.99/GPU·hr
1d ago
3,530,842
Lambda LabsNVIDIA H100-SXM-80GBServer$3.99/GPU·hr
1d ago
3,508,227
CrusoeNVIDIA H200-SXM-141GBServer$4.29/GPU·hr
1d ago
3,469,034
RunPodNVIDIA H200-SXM-141GBServer$4.31/GPU·hr
1d ago
3,452,937
OCINVIDIA B200-SXM-180GBOffline$14.00/GPU·hr
today
3,177,579
OCINVIDIA B200-SXM-180GBServer$14.00/GPU·hr
today
3,164,249
CoreWeaveNVIDIA H200-SXM-141GBOffline$6.31/GPU·hr
1d ago
2,528,344
CoreWeaveNVIDIA H200-SXM-141GBServer$6.31/GPU·hr
1d ago
2,358,503
CoreWeaveNVIDIA H100-SXM-80GBOffline$6.16/GPU·hr
1d ago
2,287,023
CoreWeaveNVIDIA H100-SXM-80GBServer$6.16/GPU·hr
1d ago
2,272,374
OCINVIDIA H200-SXM-141GBOffline$10.00/GPU·hr
today
1,595,385
Azure (eastus2)NVIDIA H200-SXM-141GBOffline$10.60/GPU·hr
today
1,505,080
GCP (us-central1)NVIDIA H200-SXM-141GBOffline$10.60/GPU·hr
today
1,504,958
OCINVIDIA H200-SXM-141GBServer$10.00/GPU·hr
today
1,488,216
Azure (eastus2)NVIDIA H200-SXM-141GBServer$10.60/GPU·hr
today
1,403,977
GCP (us-central1)NVIDIA H200-SXM-141GBServer$10.60/GPU·hr
today
1,403,863
AWS (us-east-1)NVIDIA H200-NVL-141GBOffline$10.60/GPU·hr
today
1,322,401
OCINVIDIA H100-SXM-80GBOffline$10.75/GPU·hr
today
1,310,517
OCINVIDIA H100-SXM-80GBServer$10.75/GPU·hr
today
1,302,123
GCP (us-central1)NVIDIA H100-SXM-80GBOffline$11.06/GPU·hr
today
1,273,641
GCP (us-central1)NVIDIA H100-SXM-80GBServer$11.06/GPU·hr
today
1,265,483
AWS (us-east-1)NVIDIA H200-NVL-141GBServer$10.60/GPU·hr
today
1,224,696
AWS (us-east-1)NVIDIA H100-SXM-80GBOffline$12.29/GPU·hr
today
1,146,303
Azure (eastus2)NVIDIA H100-SXM-80GBOffline$12.29/GPU·hr
today
1,146,303
Azure (eastus)NVIDIA H100-SXM-80GBOffline$12.29/GPU·hr
today
1,146,303
AWS (us-east-1)NVIDIA H100-SXM-80GBServer$12.29/GPU·hr
today
1,138,961
Azure (eastus2)NVIDIA H100-SXM-80GBServer$12.29/GPU·hr
today
1,138,961
Azure (eastus)NVIDIA H100-SXM-80GBServer$12.29/GPU·hr
today
1,138,961

Generation Efficiency Ratios

AWS (us-east-1) · Offline · H200 vs H1001.15×more tok/$
AWS (us-east-1) · Server · H200 vs H1001.08×more tok/$
Azure (eastus2) · Offline · H200 vs H1001.31×more tok/$
Azure (eastus2) · Server · H200 vs H1001.23×more tok/$
CoreWeave · Offline · B200 vs H1002.26×more tok/$
CoreWeave · Offline · H200 vs H1001.11×more tok/$
CoreWeave · Server · B200 vs H1002.27×more tok/$
CoreWeave · Server · H200 vs H1001.04×more tok/$
Crusoe · Offline · H200 vs H1001.03×more tok/$
Crusoe · Server · H200 vs H1000.97×more tok/$
GCP (us-central1) · Offline · B200 vs H1004.34×more tok/$
GCP (us-central1) · Offline · H200 vs H1001.18×more tok/$
GCP (us-central1) · Server · B200 vs H1004.35×more tok/$
GCP (us-central1) · Server · H200 vs H1001.11×more tok/$
Lambda Labs · Offline · B200 vs H1001.88×more tok/$
Lambda Labs · Server · B200 vs H1001.89×more tok/$
OCI · Offline · B200 vs H1002.42×more tok/$
OCI · Offline · H200 vs H1001.22×more tok/$
OCI · Server · B200 vs H1002.43×more tok/$
OCI · Server · H200 vs H1001.14×more tok/$
RunPod · Offline · B200 vs H1001.72×more tok/$
RunPod · Offline · H200 vs H1000.79×more tok/$
RunPod · Server · B200 vs H1001.72×more tok/$
RunPod · Server · H200 vs H1000.74×more tok/$