Let me set a scene for you.
It's end of month. Your finance team pings you with the cloud bill. You open it expecting something roughly in line with last month, maybe a bit higher because you onboarded a new client. Instead, the number staring back at you is 40% bigger than you budgeted. Nobody deployed anything wild. No traffic spike. Nothing obviously different.
You spend the next three hours trying to figure out what happened.
If that sounds familiar, you're not alone, not by a long stretch. According to Flexera's 2025 State of the Cloud Report, 84% of organizations say managing cloud spend is their single biggest cloud challenge. Cloud budgets are already overshooting by 17% on average. And the Flexera 2026 report? Wasted cloud spend ticked back up to 29% after five years of slow improvement. AI workloads and new service complexity reversed all that progress in one year.
Here's what I find genuinely maddening about this: cloud providers have gotten extraordinarily good at making pricing look simple. A per-hour rate here. A per-GB cost there. You run the numbers on a napkin and think you understand what you're signing up for. You don't. Not fully. Because the pricing pages they show you are, at best, incomplete.
This article is my attempt to lay it all out honestly: every major category of hidden cost, with real dollar figures and real examples of what happens when teams don't catch these things in time.
Before blaming your engineering team or your finance department, it helps to understand the structural problem here.
Cloud providers make money when you use more. That's not a conspiracy, it's just how the business model works. So the pricing pages are naturally built to make getting started feel cheap. The charges that pile up only emerge once your workloads are running in production, at scale, talking to each other across regions, generating logs, sitting idle over weekends.
"Cloud pricing is designed to be incomprehensible, not maliciously, but the sheer number of dimensions across hundreds of services creates complexity that even experienced cloud architects struggle to navigate."
That tracks with my experience. I've talked to engineers who spend years on AWS and still get surprised by a bill. Not because they're not smart, but because the system is genuinely layered in a way that makes the true cost of a single architectural decision almost impossible to predict upfront.
This one still gets me every time I explain it to someone outside the industry. They look confused. "Wait, I'm paying to take out data I already paid to put in?"
Yes. Exactly that.
Egress fees are what cloud providers charge when data leaves their network, whether it's going to the internet, another region, a different provider, or just hopping between availability zones inside the same data centre. Getting data into the cloud is cheap or free. Getting it back out is where the meter really runs.
| Provider | Internet egress | Cross-region | Cross-AZ |
|---|---|---|---|
| AWS | $0.09 / GB | +$0.02 / GB | $0.01 / GB ea. |
| Azure | $0.087 / GB | varies by region | free intra-region |
| GCP | $0.08 / GB | $0.05 / GB (NA-EU) | $0.01–0.02 / GB |
The scale of this problem is significant. According to Gartner, egress fees can account for 10–15% of total cloud costs, and sometimes more depending on workload patterns. For a Kafka streaming cluster running 30MB/s of throughput on AWS, the cross-AZ fees alone can add up to $88,000 per year, almost entirely in networking costs.
The other thing worth saying clearly: this pricing structure is not an accident. High egress fees are how cloud providers create lock-in. If switching providers means triggering a five-figure data transfer bill just to leave, most companies just... stay.
This one isn't really the cloud provider's fault. It's ours, or more specifically, it's what happens when teams move fast, ship things, and never circle back to clean up.
Here's the pattern: a developer spins up a test environment on a Tuesday afternoon. Takes two minutes. Project wraps up, they move to the next thing. The test environment keeps running. Three months later it's still there, quietly billing, with nobody touching it. That's a "zombie workload." And they're everywhere.
FinOps research consistently shows that 30–40% of cloud resources in production environments are either idle or significantly overprovisioned. A 2025 VMware survey found nearly half of IT leaders believed more than 25% of their cloud spending was simply wasted. Not misallocated. Wasted.
Most people know that sending data to the internet costs money. What catches teams off guard is that data moving within the cloud, between your own services, inside your own account, also costs money. Sometimes a lot.
On AWS, traffic between availability zones in the same region is billed at $0.01 per GB in each direction. That sounds tiny. But for a properly redundant application running across three AZs, which is AWS's own recommended architecture, meaning every database call, every microservice hop, every log flush that crosses an AZ boundary contributes to the bill. NAT Gateway adds another $0.045 per GB on top.
Cold storage is genuinely useful and genuinely cheaper than hot storage, sometimes by 90% or more. The problem is the fine print around retrieval and minimum storage durations, which most teams don't read carefully until they've already been burned.
This is the one causing the most pain right now, in 2025. And it's only going to get messier.
As organizations have piled into generative AI services like AWS Bedrock, Azure AI Foundry, and GCP Vertex AI. They've discovered that AI billing has its own entirely separate layer of complexity. It's not just about paying for GPU time. 72% of organizations are now using generative AI services either extensively or sparingly, up from just 47% the year before. Nearly a third are spending over $12 million annually on public cloud, with AI driving much of that growth.
Here's one that doesn't get enough attention. Support plan pricing is tied to your usage, which means as your infrastructure grows, your support bill grows automatically, whether or not you're actually opening more tickets.
AWS Business Support starts at $100/month or 10% of your monthly usage, whichever is higher. For a company at $500,000/month in AWS spend, that's potentially $50,000/month in support costs. Every month. Third-party software from cloud marketplaces often carries licensing surcharges of 15–30% on top of underlying infrastructure costs.
Most companies land on a premium support tier during migration and never revisit whether it still fits two years later, when the team has built real internal expertise.
Cloud providers are generous with free tiers during the trial phase. The issue is that the line between "free forever" and "free for 12 months" is not always clearly communicated, and you often find out which is which when you see the charges start appearing.
GCP's free tier only applies in specific US regions. If you're deploying for European or Asian users and didn't specifically notice that detail, you're outside the free zone from day one. CloudWatch Logs on AWS charges separately for ingestion, storage, and archival. A team that enables verbose logging and never sets a retention policy can accumulate thousands of dollars per month.
Every major cloud provider raised prices in 2025. None of them led with a press release.
Stepping back from the individual categories for a moment, and the overall picture is sobering.
Global cloud infrastructure spending crossed $675 billion in 2025. Of that, roughly $182 billion was wasted, not misallocated wasted. Idle resources, zombie workloads, over-provisioned capacity, forgotten snapshots.
The 2026 Flexera report found that number ticked back up to 29%, reversing five years of slow improvement in one year, driven almost entirely by AI complexity.
Mid-market companies (the 500-to-5,000 employee range) tend to have the roughest time. Large enough that the dollar amounts are significant. Not yet large enough to have dedicated FinOps teams or enterprise discount agreements that would soften the blow.
The most repeated finding in FinOps research is simple: you cannot optimize what you cannot see. 44% of organizations still report limited visibility into their cloud expenditure, despite having tools available. Only 43% track costs at the unit level, meaning most teams genuinely cannot tell you the cloud cost per customer, per feature, or per product line.
Start unglamorous. Enable billing exports. Set up tags. Turn on anomaly detection. None of this is exciting work, but it's the only foundation that makes everything else possible.
A common expensive mistake: teams buy Reserved Instances or Committed Use Discounts to save money, but they do it on instances that are already over-provisioned. You lock in a discount on capacity you're not actually using. Right-size first. Then commit. AWS Compute Optimizer, Azure Advisor, and GCP Recommender use actual utilization data and are a much better starting point than your initial capacity estimates probably were.
Most cross-AZ and cross-region charges exist because of architectural decisions made when nobody was thinking about billing. Mapping your actual data flows with cost as the lens often reveals paths that don't need to exist. For AI workloads, co-locating data processing services with the data storage can eliminate an entire category of transfer charges.
Here's the uncomfortable truth: cloud cost optimization works while someone cares, and drifts when that person moves on. Organizations that implement real FinOps practices, not just tooling, but actual process and accountability, consistently achieve 30% or greater savings, often within six weeks of starting. On a $1 million cloud bill, that's $300,000 back.
I want to be fair here, because "cloud providers are screwing you" is a slightly too simple take.
The complexity of cloud billing is partly just... the complexity of cloud infrastructure. Building globally redundant systems that serve millions of customers with wildly different workloads requires an enormous number of pricing dimensions. Some of the nuance is genuinely unavoidable.
But it's also true that the current structure benefits providers when their customers don't optimize. Every idle instance, every forgotten snapshot, every data transfer that didn't need to happen: that all shows up as revenue. The incentive to make the bill easy to understand is not as strong as we might wish it were.
What hasn't changed is that none of it works on autopilot. Somebody on your team has to care about this, has to own it, has to check the bill with the same rigor they'd check a latency dashboard or an error rate. The teams doing that are the ones finding 30–40% savings that the others leave sitting on the table.
Your bill has a story in it. Most teams just haven't made the time to read it.