GPU

Nvidia H100

Hopper-based flagship data center GPU for large-scale LLM training, high-throughput inference, and HPC acceleration.

Release

2022

GPU Class

Data Center / AI Accelerator

Architecture

Hopper

PRICE SNAPSHOT

Loading live GPU prices...

On-premise Module

~$25k–$40k

Turnkey System

~$375k–$450k

Cloud Pricing (per GPU/hr)

~$1.40–$11+/hr

chip identity

On-premise Module

H100

GPU Class

Data Center / AI Accelerator

Release

2022

Architecture

Hopper

Target Workload

Large language model training and fine-tuning
High-throughput inference (batch and real-time serving)
HPC and mixed AI/HPC workloads

Compatible Platforms

DGX H100 systems
HGX H100 baseboards
OEM enterprise servers (PCIe H100)
Custom AI clusters via OEM
Interconnect: NVLink 4 (multi-GPU scaling)
Software Stack: CUDA, cuDNN, NVIDIA AI Enterprise

Ideal Buyer Profile

Enterprises and organizations requiring:

High-performance training and inference with mature CUDA ecosystem support
Scalable multi-GPU deployments using NVLink-connected platforms
Strong mixed-precision performance for transformer workloads (FP8/BF16/FP16)
A proven, widely deployed accelerator with broad OEM and cloud availability

Typical adopters include cloud service providers, enterprises building private AI clusters, and research institutions standardizing on Hopper-generation infrastructure.

Availability Notes

The H100 is widely available through OEM systems, DGX platforms, and cloud providers, and is commonly deployed in 4- or 8-GPU configurations within HGX- or DGX-based systems. PCIe variants enable broader enterprise server integration, while SXM variants are used for maximum density and scale-up performance. Pricing and allocation vary by form factor, vendor, and region, reflecting ongoing demand patterns for data center accelerators.

Recent Developments

Mar 2022 — NVIDIA announces Hopper H100 as the next-generation data center GPU platform.
Q1 2023 — DGX H100 and HGX H100-based partner systems expand general availability across OEM and hyperscale deployments.
Apr 2025 — Ongoing software optimizations in inference stacks improve H100 transformer serving throughput on widely used LLM benchmarks.
Aug 2025 — Expanded deployment of H100 NVL configurations highlights demand for higher-memory PCIe options in enterprise inference environments.
2025–2026 — Continued policy and regional availability dynamics keep H100 procurement and allocation an important planning factor for global deployments.

overview

The H100 is a Hopper-generation data center GPU designed to accelerate modern AI training and inference through FP8-focused tensor compute and high-bandwidth HBM3 memory. It introduced Hopper’s Transformer Engine to improve efficiency and throughput for transformer workloads while maintaining strong performance for mixed precision and HPC use cases.

H100 is deployed as a GPU module in SXM-based multi-GPU platforms and as a PCIe accelerator for broader enterprise server integration. It commonly underpins 4- or 8-GPU baseboards and turnkey systems used in AI factories, hyperscale training clusters, and enterprise inference deployments.

Key specifications

Specification	H100 GPU
Architecture	NVIDIA Hopper
Memory	80 GB HBM3 (SXM, PCIe); 94 GB HBM3 (H100 NVL PCIe variant)
Memory Bandwidth	3.35 TB/s (SXM); ~2.0 TB/s (PCIe); 3.9 TB/s (H100 NVL)
Interconnect	NVLink 4 (multi-GPU)
Form Factor	SXM, PCIe (including H100 NVL variant)
Max TGP	~700 W (SXM); ~350 W (PCIe)
Precision Support	FP64, TF32, FP16, BF16, FP8, INT8
Typical AI Compute	~4 PFLOPS (FP8, sparsity-dependent)
Process Node	TSMC 4N
Transistor Count	~80 billion
MIG Support	Supported
NVLink (peer)	~900 GB/s bidirectional aggregate (SXM)

Performance Summary

AI/ML Throughput: Strong FP8 and FP16 tensor throughput optimized for large-scale training and inference; performance uplift versus A100 is substantial in transformer workloads, with realized gains varying by model, sequence length, and software stack.
Tensor Compute: Hopper Tensor Cores and Transformer Engine improve efficiency for FP8, BF16, and FP16 workflows, supporting mixed-precision training and high-throughput inference pipelines.
Memory Bandwidth: High HBM3 bandwidth (up to 3.35 TB/s on SXM) reduces memory bottlenecks, supporting larger batch sizes and better utilization on memory-intensive transformer workloads.
Multi-GPU Scaling: NVLink 4 enables high-bandwidth GPU-to-GPU communication inside multi-GPU nodes, supporting efficient scale-up for large-model training and tightly coupled parallelism.

Compared to A100, the H100 delivers materially higher effective throughput on modern transformer workloads through FP8-optimized tensor compute and higher memory bandwidth, while retaining strong support for HPC and mixed AI workloads.

primary use case

Large-scale LLM training and fine-tuning using FP8/FP16 mixed-precision workflows
High-throughput inference for generative AI services, including batched serving and real-time endpoints
Enterprise generative AI where CUDA ecosystem maturity and operational tooling are priorities
HPC and scientific computing workloads benefiting from FP64 and high memory bandwidth
Multi-GPU training where NVLink-connected scale-up improves efficiency for model parallelism

Alternatives & Upgrade Path

Comparable NVIDIA GPUs:

H200: Hopper-generation refresh with significantly higher memory capacity and bandwidth; often preferred for memory-bound LLM inference and larger-context workloads.
B200: Blackwell-generation flagship with higher ceiling for next-generation AI training and inference at lower precision.

Competitor GPUs:

AMD Instinct MI300X: High-memory data center GPU positioned for large-model inference and memory-bound AI workloads.

For organizations operating existing Hopper deployments, H100 remains a strong baseline for training and inference, while H200 is the typical upgrade when memory bandwidth and capacity become limiting. Buyers planning frontier-scale training or longer infrastructure lifecycles may evaluate B200 depending on availability, power constraints, and platform readiness.

Related Chips & Providers

Related NVIDIA GPUs:

Competitor GPUs:

AMD Instinct MI300X

SUMMARY

The H100 is NVIDIA’s Hopper-generation flagship data center GPU that established FP8-optimized tensor compute and high-bandwidth HBM3 as the baseline for modern AI infrastructure. It delivers strong training and inference performance for transformer workloads, supports multi-GPU scale-up through NVLink 4, and remains broadly available across DGX, HGX, OEM, and cloud platforms. H100 is a common foundation for large-scale AI clusters, with H200 serving as the primary memory-focused upgrade path and B200 representing the next-generation step for organizations planning frontier-scale deployments.

GPU

Nvidia H100

Hopper-based flagship data center GPU for large-scale LLM training, high-throughput inference, and HPC acceleration.

Release

2022

GPU Class

Data Center / AI Accelerator

Architecture

Hopper

PRICE SNAPSHOT

On-premise Module

~$25k–$40k

Turnkey System

~$375k–$450k

Cloud Pricing (per GPU/hr)

~$1.40–$11+/hr

chip identity

On-premise Module

H100

GPU Class

Data Center / AI Accelerator

Release

2022

Architecture

Hopper

Target Workload

Compatible Platforms

Ideal Buyer Profile

Availability Notes

Recent Developments

overview

Key specifications

Performance Summary

primary use case

Alternatives & Upgrade Path

Related Chips & Providers

SUMMARY

Newsletter

Stay Ahead in Cloud & Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.

Cloud Pricing (per GPU/hr)

Stay Ahead in Cloud
& Data Infrastructure