🏠Home > Articles > Google Cloud AI Chip – Workload Economics

Google Cloud AI Chip - Workload Economics

DataStorage Editorial Team

In the News 6 min read · May 2026

Table of Contents

Why Now, and Why Does It Matter
The Two Chips, and the Two Different Problems They Solve
What This Actually Costs You — The TCO Story
The Ironwood Foundation — What Came Before and What It Proved
The Supply Chain Underneath the Chips
What It Means for Enterprises Evaluating Cloud AI
Where This Leaves Nvidia
The Bigger Picture

Google has looked at two fundamentally different problems, admitted that one chip cannot solve both well, and designed two separate pieces of silicon to prove it and what that decision means for the economics of running AI at scale is the real story.

For over a decade, Google's Tensor Processing Units have been one of the industry's best-kept competitive advantages. While the rest of the world scrambled for Nvidia H100s and debated GPU clusters, Google had been quietly running its own chip programme, building custom silicon that powered Gemini, Google Search, YouTube's recommendations, and almost everything else at scale. It worked well enough. But something shifted this year, and the change is more fundamental than a speed bump.

At Google Cloud Next 2026 in Las Vegas last month, the company announced that its eighth-generation TPU would not be one chip. It would be two. The TPU 8t for training AI models. The TPU 8i for running them. That distinction — training versus inference — has always existed on paper. Google just decided to stop pretending the same hardware handles both gracefully.

Why Now, and Why Does It Matter

The timing is not accidental. Inference workloads now account for more than 70 percent of AI accelerator cycles, and the economics of each query have become a business problem rather than just a technical one. When Anthropic reports that Claude handles more than 14 billion requests a day, the cost of answering each one starts to look a lot like a utility bill. At that scale, an 80 percent improvement in cost per query is not a talking point, it is the difference between a sustainable margin and a burning one.

The hyperscalers are building their own chips not because they think they can beat Nvidia on every metric but because they have concluded that purpose-built inference silicon, optimised for their specific workloads and deployed at their specific scale, produces better economics than buying Nvidia GPUs at Nvidia's margins.

42.5

Exaflops Ironwood TPU pod (Gen 7)

Google Cloud 2025

121

Exaflops TPU 8t superpod (Gen 8)

Google Cloud Next 2026

80%

Better perf-per-dollar on TPU 8i vs Ironwood

Google, April 2026

2.8×

Better training price-performance on TPU 8t

Google, April 2026

Amin Vahdat, Google's SVP and chief technologist for AI infrastructure, made a pointed remark that captures the internal logic of this decision. Google designs every layer of its AI stack end-to-end, and that vertical integration is starting to show up in cost-per-token economics that Google says its rivals cannot match. The chip announcement is the most visible part of that stack, but the story underneath involves networking, cooling, data centre design, and software, all of it built and owned by one company.

"We realized two years ago that one chip a year wouldn't be enough. This is our first shot at actually going with two super high-powered specialized chips." - Amin Vahdat, SVP & Chief Technologist for AI & Infrastructure, Google

The Two Chips, and the Two Different Problems They Solve

Designing one chip that is optimal for both training and inference has always been a compromise. Google has decided to stop compromising. The split acknowledges a reality the industry has been approaching for years: the workloads are fundamentally different, and treating them the same is expensive.

TPU 8t - Training

12.6 PetaFLOPS (FP4)

216 GB HBM per chip

6.5 TB/s HBM bandwidth

128 MB on-chip SRAM

9,600 chips per pod

121 Exaflops total

+2.8× vs Ironwood

TPU 8i — Inference

10.1 PetaFLOPS (FP4)

288 GB HBM per chip

8.6 TB/s HBM bandwidth

384 MB on-chip SRAM (3×)

1,152 chips per pod

19.2 Tb/s ICI bandwidth

+80% perf-per-dollar vs Ironwood

TPU 8t vs TPU 8i - Architecture comparison, Google Cloud Next 2026

Training: TPU 8t

Training workloads demand maximum compute density and memory bandwidth to process trillions of parameters across weeks of continuous operation. The TPU 8t is built around that reality. A TPU 8t superpod scales to 9,600 liquid-cooled chips knit together by 2 petabytes of shared high-bandwidth memory, doubling interchip bandwidth over Ironwood. Each chip carries 216 GB of high-bandwidth memory at 6.5 TB per second of bandwidth, and up to 12.6 petaFLOPS of 4-bit floating point compute. The headline number is 121 exaflops per pod - nearly three times what Ironwood delivered.

But raw numbers are only part of what matters in training. The other part is goodput - how much of that compute is actually being used productively. Every hardware failure, network stall, or checkpoint restart is time the cluster is not training, and at frontier training scale, every percentage point can translate into days of active training time. Google's Virgo Network fabric and fourth-generation liquid cooling were designed with exactly this in mind.

Inference: TPU 8i

Inference is a different animal. For each token generated, the entire model's active weights need to be streamed through memory. While compute is still important, the main bottleneck tends to be memory bandwidth. The TPU 8i trades some raw floating point horsepower for a much larger on-chip SRAM cache and a faster, higher-capacity memory pool.

The TPU 8i features 10.1 petaFLOPS of FP4 compute fed by 384 MB of on-chip SRAM, and 288 GB of HBM good for 8.6 TB per second of bandwidth. That tripling of on-chip SRAM is not an accident - it is designed to hold agent working sets without trips to off-chip memory, which is where latency gets killed in production workloads. Google says its collective communication latencies are reduced five-fold, which translates into better economics by allowing them to pack more users onto the same hardware.

What This Actually Costs You - The TCO Story

Benchmarks are self-reported and should always be read with appropriate scepticism. But the total cost of ownership conversation around Google's TPUs has been building credibility for a while. Independent benchmarks published by SemiAnalysis put Ironwood's TCO at roughly 44 percent lower than a comparable GB200 server configuration, even accounting for a small shortfall in peak FLOP numbers. The same analysis put Ironwood's cost at $0.18 per million tokens for Gemini inference, versus $0.31 per million tokens on comparable B200 configurations.

Nvidia GB200

1.00× (baseline)

Ironwood (Gen 7)

~0.58× TCO

TPU 8i (Gen 8, proj.)

~0.32× TCO

0% 25% 50% 75% 100%

Relative TCO per inference workload lower is better. TPU 8i figure extrapolated from SemiAnalysis Ironwood data + Google's claimed 80% improvement. Independent benchmarks pending GA.

44%

Lower all-in TCO per Ironwood chip versus a comparable GB200 server, according to SemiAnalysis benchmarks published February 2026 even with a ~10% shortfall on peak FLOPs.

If you are a company spending a hundred thousand dollars a month on AI inference today, the arithmetic is worth doing. A company spending $100,000 monthly on AI inference could potentially reduce costs to $56,000 while maintaining the same performance levels on Ironwood alone before the additional 80 percent improvement claimed for TPU 8i is even factored in.

There is a caveat worth naming here. Google's performance claims are credible for a specific reason: the company designed the chip, the network connecting the chips, the servers hosting them, the cooling systems sustaining them, and the data centers housing all of it. No third-party chip vendor can make that statement. But that tight vertical integration is also the catch those performance advantages do not travel. They are a function of a system you do not own, and cannot negotiate around if you want to move providers in three years.

The Ironwood Foundation - What Came Before and What It Proved

It is easy to read the TPU 8 announcement as a break from the past, but it is better understood as validation of what Ironwood already proved. Ironwood powers every major Google service in production today: Gemini 3.5, Search AI Overviews, YouTube's recommendation stack, Gmail's smart features, and Google Photos' on-device models. TPU utilization exceeded 91 percent network-wide in March 2026, a number that would be commercially implausible if the chip did not deliver on its performance-per-dollar claims.

2015

TPU v1 deployed internally at Google

2017

TPU v2 adds training; first external cloud access

2021–23

TPU v4 & v5 - Google uses TPUs as cloud differentiator

Apr 2025

Ironwood (TPU v7) — 42.5 Exaflops, built for inference era

Late 2025

Ironwood GA; Anthropic commits to TPU for Claude

Apr 2026

TPU 8t & 8i unveiled - first training/inference split in programme history

Google TPU generation timeline — 2015 to 2026

The Supply Chain Underneath the Chips

Broadcom handles the high-performance training silicon under a relationship that has been described as a $46 billion AI contract. MediaTek handles cost-optimised inference, having already proved its ability to deliver I/O modules for Ironwood at 20 to 30 percent lower cost. This is not just an engineering decision, it is a deliberate multi-supplier strategy that reduces Google's dependence on any single partner while keeping cost pressure on each of them.

Google projects 4.3 million TPU shipments in 2026, rising to 10 million in 2027 and more than 35 million in 2028. The capital expenditure to support this is enormous. Google has committed $175 billion to $185 billion in infrastructure spending for 2026, nearly doubling the $91.4 billion it spent in 2025.

TPU 8 Projected Shipment Scale

2026

4.3M units

2027

10M units

2028

35M+ units

Google projected TPU shipment volumes source: The Next Web / Google projections

What to Watch

Intel, Marvell, and TSMC are all part of the supply chain supporting the TPU 8 programme.
The chips reportedly target TSMC's 2nm process node for the full generation rollout in late 2027.
Independent benchmarks from early cloud customers will be the real test of whether Google's claimed economics hold outside of vendor-controlled conditions. General availability is expected later in 2026.

What It Means for Enterprises Evaluating Cloud AI

For enterprises evaluating AI infrastructure, this changes the math on which cloud platform to standardize. If your workload is predominantly inference at scale think customer-facing agents, recommendation systems, summarisation tools, the TPU 8i represents a genuinely different cost structure than what general-purpose GPUs can offer.

But the switching cost conversation is real and deserves honesty. Enterprises that have standardized on Nvidia hardware for inference face real switching costs to move onto TPUs not because the software migration is complicated, but because the performance advantages of Google's stack are a function of the full infrastructure underneath it. The TPU 8i is not a drop-in replacement; it is the centrepiece of a system that Google has designed from network to cooling to orchestration.

Enterprise Decision Framework

Are you primarily running inference workloads at scale (agents, search, recommendations)?

Yes → Evaluate TPU 8i on Vertex AI

Are you training large proprietary models?

Yes → Evaluate TPU 8t availability & goodput SLAs

Are you locked into AWS or Azure through enterprise agreements?

Yes → Migration cost may outweigh savings model first

Do you consume Gemini through Gemini Enterprise?

Yes → You inherit the TPU 8i lift automatically

Where This Leaves Nvidia

Nvidia's position is more nuanced than the chip war framing often suggests. Google still sells Nvidia GB300 systems inside Google Cloud the competition is layered, not head-to-head. Jensen Huang has argued consistently that general-purpose GPUs optimise for the next workload, while ASICs optimise for today's. That is a credible point for organizations that expect their AI workloads to evolve quickly and unpredictably.

But Nvidia's data-centre gross margin, currently above 75 percent, faces meaningful compression as custom silicon captures a larger share of hyperscaler accelerator spend. Google is not the only company building this way. Amazon has Trainium and Inferentia, Microsoft has Maia chips, and Meta has its MTIA accelerators. The Nvidia moat is deep, particularly in software, but it is no longer the only viable option for organizations that have the scale and engineering resources to make custom silicon work.

Custom Silicon Share of Hyperscaler Accelerator Spend (Projected)

Nvidia GPUs now

~85%

Custom silicon now

~15%

Custom silicon 2028

25–30%

0% 25% 50% 75% 100%

Source: Forward Future AI / industry analyst projections. Figures illustrative.

The Bigger Picture

What Google has done with the TPU 8 split is not just an engineering decision. It is a statement about where AI infrastructure economics are heading. McKinsey's analysis predicts that inference spend will outpace training spend in enterprise budgets through 2027. If that forecast is accurate, then the economics of serving AI not building it become the defining competitive battleground. Google has positioned itself to compete on exactly that ground, with hardware designed specifically for the purpose.

Vahdat made a prediction worth noting: as general-purpose CPUs plateau, workloads that matter will demand purpose-built silicon. "Two chips might become more," he said — without specifying whether that means future TPU variants or other classes of specialized accelerators. The frontier compute race has changed character. It used to be about who could secure the most H100s. It is now about who controls the full stack from chip design to data centre fabric to the software that makes all of it run efficiently.

Silicon is the new contract. Google is not selling you a chip it is selling you a relationship with an entire infrastructure stack, priced to make alternatives look expensive.

References

Share this article

🔍 Browse by categories

AI Infrastructure & Workflows

Cloud Cost & Pricing Transparency

Cloud Infrastructure Basics

Multi-Cloud & Migration Strategy

Security Management Optimization

Strategic Infrastructure Insights

Free Cloud Cost Calculator

Compare AWS, Google Cloud, Azure, and alternatives like Backblaze B2 Discover how much you could save in seconds

🔥 Trending Articles

AI Agents in Production: The Infrastructure Requirements Nobody Warns You About

# Agentic AI, # AI Agents, # AI Infrastructure, # Production Deployment

Enterprise AI Has Not Even Started: Why GPU Demand Could 10x From Here

# AI Infrastructure, # Enterprise AI, # GPU Demand, # Strategy

Hot, Warm, Cold, Archive: The Data Tiering Strategy That Cuts Storage Bills by Half

# Archive Storage, # Cloud Cost, # Data Tiering, # FinOps, # Storage Cost

What Is S3-Compatible Storage and Why Does It Matter in 2026?

# AWS S3, # Backblaze, # Object Storage, # S3-Compatible Storage, # Wasabi