GPU

NVIDIA L40S

Ada Lovelace–based data center GPU optimized for cost-efficient AI inference, fine-tuning, and graphics-accelerated workloads.

Release

2023

GPU Class

Data Center / AI Inference & Graphics Accelerator

Architecture

NVIDIA Ada Lovelace

PRICE SNAPSHOT

Loading live GPU prices...

On-premise Module

~$7k–$10k

Turnkey System

~$60k–$120k+

Cloud Pricing (per GPU/hr)

~$0.70–$3.50+/hr

chip identity

On-premise Module

L40S

GPU Class

Data Center / AI Inference & Graphics Accelerator

Release

2023

Architecture

NVIDIA Ada Lovelace

Target Workload

AI inference and fine-tuning
Multimodal and vision-based AI workloads
Graphics, visualization, and rendering
Video processing and AI-assisted media pipelines

Compatible Platforms

OEM enterprise servers (PCIe, air-cooled)
Virtualized GPU platforms (vGPU-enabled systems)
Cloud AI and graphics instances
Interconnect: PCIe Gen4 (no NVLink)
Software Stack: CUDA, cuDNN, TensorRT, NVIDIA AI Enterprise, NVIDIA vGPU

Ideal Buyer Profile

Enterprises and organizations requiring:

Cost-efficient AI inference at scale
Broad compatibility with PCIe servers and virtualization platforms
Strong support for graphics, visualization, and video workloads
Lower power consumption compared to flagship training GPUs

Typical adopters include cloud providers, enterprises deploying AI inference pipelines, media and visualization teams, and organizations running mixed AI and graphics workloads.

Availability Notes

The L40S is widely available through OEM server vendors, NVIDIA-certified systems, and cloud providers. It is commonly deployed in air-cooled PCIe servers and supports virtualization, making it accessible for enterprise rollouts without specialized infrastructure.

Recent Developments

2023 — NVIDIA introduces L40S as a data center successor to A40, targeting AI inference and graphics workloads.
2024 — Broad adoption across cloud providers for inference-focused AI and visualization instances.
2025 — Continued expansion of vGPU and inference-optimized software support within NVIDIA AI Enterprise.

overview

The NVIDIA L40S is an Ada Lovelace–generation data center GPU designed for inference-heavy AI workloads, graphics acceleration, and mixed AI/visualization use cases. Unlike flagship training GPUs, L40S prioritizes performance per watt, cost efficiency, and broad deployability across standard PCIe servers.

L40S combines Ada Tensor Cores with GDDR6 memory and is widely used for AI inference, fine-tuning, digital twins, visualization, and video workloads. It is commonly deployed in enterprise data centers and cloud environments where scalability, virtualization, and operational efficiency matter more than peak training throughput.

Key specifications

Specification	L40S GPU
Architecture	NVIDIA Ada Lovelace
Memory	48 GB GDDR6
Memory Bandwidth	~864 GB/s
Interconnect	PCIe Gen4
Form Factor	PCIe
Max TGP	~350 W
Precision Support	FP32, TF32, FP16, BF16, FP8, INT8
Typical AI Compute	~1.45 PFLOPS (FP8, sparsity-dependent)
Process Node	TSMC 4N
Transistor Count	~76 billion
MIG Support	Not supported
NVLink (peer)	Not supported

Performance Summary

AI/ML Throughput: Strong FP8, FP16, and INT8 performance optimized for inference and fine-tuning rather than large-scale training.
Tensor Compute: Ada-generation Tensor Cores enable efficient transformer inference, vision models, and multimodal pipelines.
Memory Bandwidth: GDDR6 bandwidth (~864 GB/s) supports moderate-size models and batching, though not designed for extreme memory-bound workloads.
Multi-GPU Scaling: Scaling relies on PCIe and network-level parallelism rather than GPU-to-GPU fabrics, making L40S best suited to embarrassingly parallel inference workloads.

Compared to H100 and H200, L40S trades peak training performance and NVLink scale-up for lower power draw, lower cost, and broader deployment flexibility.

primary use case

Production AI inference for language, vision, and multimodal models
Model fine-tuning and experimentation at lower cost than flagship GPUs
Graphics and visualization workloads, including digital twins and 3D rendering
Video processing and streaming with AI-assisted encoding and analytics
Virtualized GPU workloads in enterprise and cloud environments

Alternatives & Upgrade Path

Comparable NVIDIA GPUs:

L4: Lower-power inference GPU optimized for cost and density.
H100: Hopper-generation flagship preferred for large-scale training and tightly coupled multi-GPU workloads.

Competitor GPUs:

AMD Instinct MI210 / MI250 (inference configurations): Used in some HPC-adjacent inference environments, though with less mature graphics and virtualization support.

For organizations focused on inference and visualization, L40S is often preferred over H-series GPUs due to lower cost and operational overhead. Training-heavy workflows typically favor H100 or newer Blackwell-generation accelerators.

Related Chips & Providers

Related NVIDIA GPUs:

L4
A40
H100

Competitor GPUs:

AMD Instinct MI210
AMD Instinct MI250

SUMMARY

The NVIDIA L40S is an Ada Lovelace–based data center GPU optimized for inference, fine-tuning, and graphics-accelerated workloads rather than frontier-scale training. It delivers strong tensor performance, broad PCIe compatibility, and excellent performance per watt, making it a popular choice for production AI inference and visualization deployments. L40S fills a critical role in enterprise AI infrastructure as a scalable, cost-efficient alternative to flagship training GPUs.

GPU

NVIDIA L40S

Ada Lovelace–based data center GPU optimized for cost-efficient AI inference, fine-tuning, and graphics-accelerated workloads.

Release

2023

GPU Class

Data Center / AI Inference & Graphics Accelerator

Architecture

NVIDIA Ada Lovelace

PRICE SNAPSHOT

On-premise Module

~$7k–$10k

Turnkey System

~$60k–$120k+

Cloud Pricing (per GPU/hr)

~$0.70–$3.50+/hr

chip identity

On-premise Module

L40S

GPU Class

Data Center / AI Inference & Graphics Accelerator

Release

2023

Architecture

NVIDIA Ada Lovelace

Target Workload

Compatible Platforms

Ideal Buyer Profile

Availability Notes

Recent Developments

overview

Key specifications

Performance Summary

primary use case

Alternatives & Upgrade Path

Related Chips & Providers

SUMMARY

Newsletter

Stay Ahead in Cloud & Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.

Cloud Pricing (per GPU/hr)

Stay Ahead in Cloud
& Data Infrastructure