GPU
nvidia-logo

NVIDIA L40S

Ada Lovelace–based data center GPU optimized for cost-efficient AI inference, fine-tuning, and graphics-accelerated workloads.

Release

2023

GPU Class

Data Center / AI Inference & Graphics Accelerator

Architecture

NVIDIA Ada Lovelace

PRICE SNAPSHOT

Loading live GPU prices...

On-premise Module

~$7k–$10k

Turnkey System

~$60k–$120k+

Cloud Pricing
(per GPU/hr)

~$0.70–$3.50+/hr

chip identity

NVIDIA L40S

On-premise Module

L40S

GPU Class

Data Center / AI Inference & Graphics Accelerator

Release

2023

Architecture

NVIDIA Ada Lovelace

Target Workload

  • AI inference and fine-tuning
  • Multimodal and vision-based AI workloads
  • Graphics, visualization, and rendering
  • Video processing and AI-assisted media pipelines

Compatible Platforms

  • OEM enterprise servers (PCIe, air-cooled)
  • Virtualized GPU platforms (vGPU-enabled systems)
  • Cloud AI and graphics instances
  • Interconnect: PCIe Gen4 (no NVLink)
  • Software Stack: CUDA, cuDNN, TensorRT, NVIDIA AI Enterprise, NVIDIA vGPU

Ideal Buyer Profile

Enterprises and organizations requiring:

  • Cost-efficient AI inference at scale
  • Broad compatibility with PCIe servers and virtualization platforms
  • Strong support for graphics, visualization, and video workloads
  • Lower power consumption compared to flagship training GPUs

Typical adopters include cloud providers, enterprises deploying AI inference pipelines, media and visualization teams, and organizations running mixed AI and graphics workloads.

Availability Notes

The L40S is widely available through OEM server vendors, NVIDIA-certified systems, and cloud providers. It is commonly deployed in air-cooled PCIe servers and supports virtualization, making it accessible for enterprise rollouts without specialized infrastructure.

Recent Developments

  • 2023 — NVIDIA introduces L40S as a data center successor to A40, targeting AI inference and graphics workloads.
  • 2024 — Broad adoption across cloud providers for inference-focused AI and visualization instances.
  • 2025 — Continued expansion of vGPU and inference-optimized software support within NVIDIA AI Enterprise.

overview

The NVIDIA L40S is an Ada Lovelace–generation data center GPU designed for inference-heavy AI workloads, graphics acceleration, and mixed AI/visualization use cases. Unlike flagship training GPUs, L40S prioritizes performance per watt, cost efficiency, and broad deployability across standard PCIe servers.

L40S combines Ada Tensor Cores with GDDR6 memory and is widely used for AI inference, fine-tuning, digital twins, visualization, and video workloads. It is commonly deployed in enterprise data centers and cloud environments where scalability, virtualization, and operational efficiency matter more than peak training throughput.

Key specifications

Specification L40S GPU
Architecture NVIDIA Ada Lovelace
Memory 48 GB GDDR6
Memory Bandwidth ~864 GB/s
Interconnect PCIe Gen4
Form Factor PCIe
Max TGP ~350 W
Precision Support FP32, TF32, FP16, BF16, FP8, INT8
Typical AI Compute ~1.45 PFLOPS (FP8, sparsity-dependent)
Process Node TSMC 4N
Transistor Count ~76 billion
MIG Support Not supported
NVLink (peer) Not supported

Performance Summary

  • AI/ML Throughput: Strong FP8, FP16, and INT8 performance optimized for inference and fine-tuning rather than large-scale training.
  • Tensor Compute: Ada-generation Tensor Cores enable efficient transformer inference, vision models, and multimodal pipelines.
  • Memory Bandwidth: GDDR6 bandwidth (~864 GB/s) supports moderate-size models and batching, though not designed for extreme memory-bound workloads.
  • Multi-GPU Scaling: Scaling relies on PCIe and network-level parallelism rather than GPU-to-GPU fabrics, making L40S best suited to embarrassingly parallel inference workloads.

Compared to H100 and H200, L40S trades peak training performance and NVLink scale-up for lower power draw, lower cost, and broader deployment flexibility.

primary use case

  • Production AI inference for language, vision, and multimodal models
  • Model fine-tuning and experimentation at lower cost than flagship GPUs
  • Graphics and visualization workloads, including digital twins and 3D rendering
  • Video processing and streaming with AI-assisted encoding and analytics
  • Virtualized GPU workloads in enterprise and cloud environments

Alternatives & Upgrade Path

Comparable NVIDIA GPUs:

  • L4: Lower-power inference GPU optimized for cost and density.
  • H100: Hopper-generation flagship preferred for large-scale training and tightly coupled multi-GPU workloads.

Competitor GPUs:

  • AMD Instinct MI210 / MI250 (inference configurations): Used in some HPC-adjacent inference environments, though with less mature graphics and virtualization support.

For organizations focused on inference and visualization, L40S is often preferred over H-series GPUs due to lower cost and operational overhead. Training-heavy workflows typically favor H100 or newer Blackwell-generation accelerators.

Related Chips & Providers

Related NVIDIA GPUs:

Competitor GPUs:

  • AMD Instinct MI210
  • AMD Instinct MI250

SUMMARY

The NVIDIA L40S is an Ada Lovelace–based data center GPU optimized for inference, fine-tuning, and graphics-accelerated workloads rather than frontier-scale training. It delivers strong tensor performance, broad PCIe compatibility, and excellent performance per watt, making it a popular choice for production AI inference and visualization deployments. L40S fills a critical role in enterprise AI infrastructure as a scalable, cost-efficient alternative to flagship training GPUs.

Newsletter

Stay Ahead in Cloud
& Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.