GPU

AMD MI325X

CDNA 3–based high-memory data center GPU designed to extend large-scale LLM inference and training with expanded HBM3e capacity.

Release

2024

GPU Class

Data Center / AI Accelerator

Architecture

AMD CDNA 3

PRICE SNAPSHOT

Loading live GPU prices...

On-premise Module

~$25k–$35k*

Turnkey System

~$275k–$350k+†

Cloud Pricing (per GPU/hr)

~$2.50–$8+/hr‡

chip identity

On-premise Module

MI325X

GPU Class

Data Center / AI Accelerator

Release

2024

Architecture

AMD CDNA 3

Target Workload

Large language model inference at very large context lengths
Memory-bound generative AI workloads
RAG pipelines with expanded KV-cache requirements
LLM training and fine-tuning where memory capacity is a primary constraint

Compatible Platforms

OEM 8-GPU server platforms based on AMD Instinct MI325X
Custom AI clusters via OEM and system integrators
Cloud bare-metal GPU instances (select providers)
Interconnect: Infinity Fabric (xGMI intra-node; PCIe Gen5 for host and scale-out)
Software Stack: ROCm, HIP, PyTorch (ROCm), TensorFlow (ROCm), OpenAI Triton

Ideal Buyer Profile

Enterprises and organizations requiring:

Maximum GPU memory capacity per device
Cost-efficient scaling for memory-bound AI inference
Flexibility to deploy on OEM and cloud platforms
Willingness to operate within the ROCm software ecosystem

Typical adopters include cloud providers, AI service operators, and enterprises deploying very large language models with expanding context and cache requirements.

Availability Notes

MI325X is available through OEM server vendors, system integrators, and select cloud providers rather than as a standalone retail component. Deployments are commonly tied to complete server platforms, with availability and pricing varying by vendor, region, and volume.

Recent Developments

Jun 2024 — AMD announces MI325X as a high-memory expansion of the Instinct MI300 family.
Late 2024 — OEM partners begin validating MI325X-based server platforms for enterprise and cloud use.
2025 — ROCm software updates focus on improving large-context inference performance and memory efficiency.

overview

The AMD Instinct MI325X is a CDNA 3–generation data center GPU introduced as a higher-memory successor to the MI300X. It is designed to address the growing memory demands of large language model inference and training by significantly expanding on-package HBM capacity while maintaining a similar compute architecture.

MI325X integrates stacked HBM3e memory with a CDNA 3 GPU compute complex, enabling very large models, long context windows, and substantial KV caches to remain resident on a single device. Like MI300X, it is deployed as a PCIe accelerator within OEM server platforms rather than as a vendor-branded turnkey system, and is typically used in multi-GPU servers optimized for memory-intensive AI workloads.

Key specifications

Specification	MI325X GPU
Architecture	AMD CDNA 3
Memory	256 GB HBM3e
Memory Bandwidth	~6.0 TB/s (peak theoretical)
Interconnect	PCIe Gen5 (host); xGMI for limited intra-node connectivity
Form Factor	PCIe accelerator
Max TGP	~750 W
Precision Support	FP64, FP32, FP16, BF16, FP8, INT8
Typical AI Compute	Not publicly standardized (precision- and workload-dependent)
Process Node	TSMC 5 nm and 6 nm chiplets
Transistor Count	~150+ billion (multi-die package, estimated)
MIG Support	SR-IOV / MxGPU-based partitioning
NVLink (peer)	Not supported

Performance Summary

AI/ML Throughput: Delivers similar compute characteristics to MI300X, with effective performance gains driven primarily by increased memory capacity rather than higher raw tensor throughput.
Tensor Compute: CDNA 3 matrix engines support FP16, BF16, FP8, and INT8 operations for mixed-precision AI workflows without proprietary tensor core hardware.
Memory Bandwidth: Expanded HBM3e capacity and higher peak bandwidth (~6.0 TB/s) reduce memory pressure for very large models, enabling longer context windows and larger batch sizes.
Multi-GPU Scaling: Scale-out relies on PCIe Gen5, Infinity Fabric links (platform-dependent), and high-speed networking rather than a dedicated GPU-to-GPU fabric.

Compared to MI300X, MI325X prioritizes memory scale over architectural change, making it particularly effective for next-generation inference workloads where model size and context length dominate performance.

primary use case

Very large-context LLM inference where expanded HBM capacity enables full model and KV-cache residency
Memory-bound generative AI workloads with large activations and embeddings
Retrieval-augmented generation (RAG) pipelines with growing vector and cache footprints
Batch-oriented inference serving for chat, search, and recommendation systems
Selective LLM training and fine-tuning where memory capacity outweighs interconnect limitations

Alternatives & Upgrade Path

AMD Instinct MI300X: Predecessor with lower memory capacity, suitable for most current inference workloads.
NVIDIA H200: Hopper-generation GPU with lower memory capacity but stronger scale-up via NVLink.
NVIDIA B200: Blackwell-generation flagship offering higher compute ceilings for next-generation training.

For organizations already standardized on MI300X, MI325X represents a natural upgrade when memory capacity becomes the primary bottleneck. Buyers prioritizing tight multi-GPU coupling or FP4-centric workflows may instead evaluate newer NVIDIA architectures.

Related Chips & Providers

Related AMD GPUs:

Competitor GPUs:

SUMMARY

The AMD Instinct MI325X is a CDNA 3–based data center GPU that extends AMD’s high-memory AI strategy by significantly increasing on-package HBM capacity and bandwidth. It is optimized for memory-bound inference and large-context generative AI workloads rather than tightly coupled scale-up training. Deployed primarily through OEM and cloud platforms, MI325X serves as a forward-looking upgrade for organizations pushing the limits of model size, context length, and inference efficiency within the ROCm ecosystem.

GPU

AMD MI325X

CDNA 3–based high-memory data center GPU designed to extend large-scale LLM inference and training with expanded HBM3e capacity.

Release

2024

GPU Class

Data Center / AI Accelerator

Architecture

AMD CDNA 3

PRICE SNAPSHOT

On-premise Module

~$25k–$35k*

Turnkey System

~$275k–$350k+†

Cloud Pricing (per GPU/hr)

~$2.50–$8+/hr‡

chip identity

On-premise Module

MI325X

GPU Class

Data Center / AI Accelerator

Release

2024

Architecture

AMD CDNA 3

Target Workload

Compatible Platforms

Ideal Buyer Profile

Availability Notes

Recent Developments

overview

Key specifications

Performance Summary

primary use case

Alternatives & Upgrade Path

Related Chips & Providers

SUMMARY

Newsletter

Stay Ahead in Cloud & Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.

Cloud Pricing (per GPU/hr)

Stay Ahead in Cloud
& Data Infrastructure