GPU
nvidia-logo

Nvidia H200

Hopper-based data center GPU designed to improve memory-bound LLM training and inference workloads.

Release

2024

GPU Class

Data Center / AI Accelerator

Architecture

Hopper

PRICE SNAPSHOT

Loading live GPU prices...

On-premise Module

~$30k–$40k

Turnkey System

~$300k–$400k

Cloud Pricing
(per GPU/hr)

~$2.30–$10+/hr

chip identity

NVIDIA-H200-GPU

On-premise Module

H200

GPU Class

Data Center / AI Accelerator

Release

2024

Architecture

Hopper

Target Workload

  • Large language model training and fine-tuning
  • High-throughput inference with long context windows
  • Memory-intensive generative AI and RAG pipelines

Compatible Platforms

  • DGX H200 systems
  • HGX H200 baseboards
  • OEM enterprise servers (H200 NVL)
  • Custom AI clusters via OEM
  • Interconnect: NVLink 4 (multi-GPU scaling)
  • Software Stack: CUDA, cuDNN, NVIDIA AI Enterprise

Ideal Buyer Profile

Enterprises and organizations requiring:

  • Increased GPU memory capacity to support large-model training and inference
  • Strong performance for mixed training and inference workloads
  • Proven Hopper-generation software compatibility and ecosystem support
  • Scalable multi-GPU deployments using NVLink-connected platforms

Typical adopters include cloud service providers, enterprises deploying private AI infrastructure, and research institutions scaling large language models without transitioning immediately to newer GPU architectures.

Availability Notes

The H200 is available in production through NVIDIA’s data center ecosystem and OEM partners. It is commonly deployed in multi-GPU configurations within HGX and DGX platforms, as well as in PCIe-based enterprise servers using the H200 NVL variant. Availability and pricing vary by configuration, system vendor, and regional supply conditions, reflecting typical data center GPU allocation patterns.

Recent Developments

  • The H200 has been positioned as a memory-optimized upgrade over H100, with emphasis on HBM3e capacity and bandwidth for large language model workloads.
  • Major OEMs and cloud providers have expanded H200-based offerings to support higher-throughput training and inference deployments.
  • Software optimizations across CUDA, cuDNN, and NVIDIA AI Enterprise continue to improve utilization of Hopper Tensor Cores and expanded memory configurations.
  • Ongoing comparisons between H200 and newer Blackwell-based GPUs frame H200 as a near-term scaling solution ahead of broader Blackwell availability.

overview

The NVIDIA H200 is a Hopper-generation data center GPU designed to improve large-model training and high-throughput inference by expanding memory capacity and bandwidth versus H100. It pairs Hopper Tensor Cores with HBM3e to reduce memory bottlenecks that commonly limit transformer workloads, especially at long context lengths and higher batch sizes.

H200 is deployed as a GPU module in SXM-based multi-GPU platforms and as a PCIe accelerator (H200 NVL) for air-cooled enterprise server designs. It commonly underpins 4- or 8-GPU baseboards and turnkey systems used in AI factories, hyperscale training clusters, and enterprise inference deployments.

Key specifications

Specification H200 GPU
Architecture NVIDIA Hopper
Memory 141 GB HBM3e
Memory Bandwidth ~4.8 TB/s
Interconnect NVLink 4 (multi-GPU)
Form Factor SXM, PCIe (H200 NVL)
Max TGP ~700 W (SXM), ~600 W (PCIe)
Precision Support FP64, TF32, FP16, BF16, FP8, INT8
Typical AI Compute ~8 PFLOPS+ (FP8, sparsity-dependent)
Process Node TSMC 4N
Transistor Count ~80 billion
MIG Support Supported
NVLink (peer) ~900 GB/s bidirectional aggregate

Performance Summary

  • AI/ML Throughput: Delivers strong FP8 and FP16 tensor throughput optimized for large-scale training and inference; performance gains versus H100 are workload-dependent and most pronounced in memory-bound LLM scenarios.
  • Tensor Compute: Hopper-generation Tensor Cores provide high efficiency for FP8, BF16, and INT8 workloads, supporting mixed-precision training and high-throughput inference pipelines.
  • Memory Bandwidth: Expanded HBM3e capacity and bandwidth (~4.8 TB/s) significantly reduce memory bottlenecks, enabling larger batch sizes and longer context windows compared with H100.
  • Multi-GPU Scaling: NVLink 4 interconnect enables high-bandwidth GPU-to-GPU communication, supporting efficient scale-out across multi-GPU nodes for large-model training.

Compared to H100, the H200 delivers higher effective throughput on modern transformer workloads primarily due to increased memory capacity and bandwidth, while maintaining Hopper-generation compute efficiency and software compatibility.

primary use case

  • Large-scale LLM training and fine-tuning where expanded GPU memory reduces model and data parallelism overhead
  • High-throughput inference for generative AI services with long context windows and large batch sizes
  • Enterprise generative AI workloads requiring stable Hopper software compatibility and predictable deployment
  • Retrieval-augmented generation (RAG) and memory-intensive inference pipelines
  • HPC and scientific computing workloads that benefit from high-bandwidth HBM and FP64 support

Alternatives & Upgrade Path

Comparable NVIDIA GPUs:

  • H100: Earlier Hopper-generation GPU with lower memory capacity and bandwidth; still widely deployed for training and inference.
  • B200: Blackwell-generation flagship GPU offering significantly higher memory capacity and tensor throughput for next-generation AI workloads.

Competitor GPUs:

  • AMD Instinct MI300X: High-memory data center GPU positioned for large-model inference and memory-bound AI workloads.

For organizations already standardized on Hopper systems, H200 provides a straightforward upgrade path from H100. Buyers planning for longer-term scaling or frontier model training may evaluate B200 or alternative high-memory accelerators depending on availability and software requirements.

Related Chips & Providers

Related NVIDIA GPUs:

  • H100 (Hopper)
  • B200 (Blackwell flagship)
  • GH200 Grace Hopper (CPU–GPU platform variant)

Competitor GPUs:

  • AMD Instinct MI300X

SUMMARY

The NVIDIA H200 is a Hopper-generation data center GPU designed to improve large-scale AI training and inference by expanding memory capacity and bandwidth compared with H100. It combines Hopper Tensor Core performance with HBM3e memory to better support modern transformer workloads, particularly those constrained by memory throughput and context length. The H200 is deployed primarily in multi-GPU data center platforms, including HGX and DGX systems, and serves as a practical upgrade path for organizations scaling AI infrastructure ahead of a transition to newer-generation architectures.

Newsletter

Stay Ahead in Cloud
& Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.