Reducing Kubernetes Costs Without Sacrificing Performance

Picture of DataStorage Editorial Team

DataStorage Editorial Team

Management & Optimization 6 min read  ·  June 2025

On average, 69% of CPU cores in Kubernetes clusters sit completely unused — and for an organization running 150 nodes, that translates to nearly a million dollars a year in overprovisioned compute alone.

Running Kubernetes in production is no longer an experiment. It is the default. But there is a difference between running it and running it well. Somewhere between the initial cluster setup and the point where bills start arriving, something quietly goes wrong for most teams.

A 2025 industry report by Spectro Cloud found that cost had overtaken skills and security as the top Kubernetes challenge, with 88% of organizations reporting a year-over-year rise in total Kubernetes total cost of ownership. Sysdig's Cloud-Native Security and Usage Report put numbers to the waste: on average, 69% of CPU cores in Kubernetes clusters sit unused. For an organization running roughly 150 nodes, that translates to nearly a million dollars a year in overprovisioned compute alone.

The uncomfortable truth is that most of this waste is not caused by bad intentions. It comes from a perfectly rational habit: pad the resource requests, leave a buffer, and move on to the next sprint. Multiply that habit across dozens of teams and hundreds of deployments and the cloud bill becomes very difficult to explain.

The good news is that it is entirely fixable, and fixing it does not mean performance has to suffer.

88%
of orgs report rising Kubernetes TCO year-over-year
Spectro Cloud 2025
69%
of CPU cores in K8s clusters sit unused on average
Sysdig Report 2025
30–50%
cost reduction achievable without performance trade-offs
Industry Results
FREE TOOL
See What You're Actually Paying Across Providers

Use our Cloud Cost Calculator to compare real pricing across AWS, Azure, GCP, Backblaze, Wasabi and more — side by side, in seconds.

Try the Free Calculator →

Why Kubernetes Costs Get Out of Control

Before reaching for optimization tools, it helps to understand where the money actually goes. There are three main buckets.

Compute waste is the biggest one. Pods request far more CPU and memory than they ever use. Nodes run at low utilization because the scheduler cannot pack workloads tightly enough. Development and staging clusters run twenty-four hours a day because nobody turned them off.

Storage and networking costs are often invisible until they show up on the bill. Persistent volumes attached to deleted pods still accrue charges. Load balancers spin up for every internal service that could just as easily use a ClusterIP. Cross-availability-zone data transfer adds up quietly in the background. For a deeper look at what providers don't surface proactively, see Hidden Costs in Cloud Billing: What Your Provider Isn't Telling You.

Operational overhead includes control plane costs for managed services like EKS, GKE, or AKS, and the engineering hours spent firefighting resource issues that proper automation would have prevented.

The starting point for any optimization effort is visibility. You genuinely cannot fix what you cannot see.

Start With Cost Visibility, Not Cuts

Jumping straight to cuts is the most common mistake. Trimming resources without understanding usage patterns causes performance regressions, and the team loses trust in the optimization effort.

The right first move is to get a clear picture of where money is going at the pod and namespace level, not just the cloud provider invoice.

Kubecost and OpenCost

Two tools dominate this space. OpenCost is a CNCF-hosted open-source project that measures and allocates real cloud cost to Kubernetes namespaces, workloads, and labels using live provider pricing. It reads pod resource usage from the cluster and multiplies it by live pricing to attribute costs to the right owners. Kubecost is built on top of OpenCost and adds a user interface, recommendations engine, and enterprise features like multi-cluster support.

For smaller teams or tight budgets, the free tier of OpenCost or Kubecost is genuinely useful. At larger scale, the cost savings from acting on the data tend to pay for themselves quickly.

What matters most here is not which tool you choose but what you do with the data. When teams see their actual per-namespace spend, something shifts. Engineers start cleaning up idle environments and oversized requests on their own, simply because there is now a number attached to the waste.

Label Everything Consistently

Cost tools only work well when workloads are labeled consistently. A tagging strategy that includes team, application, and environment as standard labels on every workload means that a cost spike can be traced to a specific team or service within minutes rather than days.

Enforce this through admission controllers or CI/CD pipeline checks. Missing labels are not just an inconvenience for finance; they hide the exact waste you are trying to find.

# Example: LimitRange to set defaults when teams forget to specify requests
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: my-team
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container

Rightsizing Is Where the Real Savings Live

Once you can see cost by workload, the next step is matching resource requests to actual usage. This is called rightsizing and it is consistently the highest-impact change most organizations make.

Engineers set CPU and memory requests once during initial deployment, pad them by two to three times as a safety margin, and then never revisit them. Six months later, you have hundreds of pods requesting 2 CPU cores each while using 0.3.

Vertical Pod Autoscaler in Recommendation Mode

The Vertical Pod Autoscaler (VPA) watches actual usage and recommends better resource requests. Running it in recommendation mode first is the sensible approach, especially for stateful workloads where automatic pod restarts carry real risk.

VPA in auto mode restarts pods to apply new resource values, which is fine for stateless services but genuinely dangerous for databases, message queues, or anything holding state in memory. For those workloads, review VPA's recommendations manually and apply changes during maintenance windows.

What the Numbers Actually Look Like

A CNCF 2025 survey found that 68% of organizations running production Kubernetes clusters overspend by 30 to 45% due to overprovisioned nodes, idle pods, and misconfigured autoscaling. For a mid-size deployment running 50 to 100 nodes on AWS EKS or Azure AKS, that overspend translates to between $40,000 and $80,000 annually in wasted compute.

Rightsizing even the top 10 most expensive workloads in a cluster typically recovers a significant portion of that waste without touching anything performance-sensitive.

Rightsizing Checklist
  • Install OpenCost or Kubecost and get per-namespace cost numbers first
  • Run VPA in recommendation mode only before auto-applying changes
  • Apply changes to stateless services first — never to databases or queues without a maintenance window
  • Target the top 10 most expensive workloads and rightsize those before touching the rest
  • Set LimitRanges as a safety net so pods without explicit requests get sane defaults

Autoscaling Done Right

Static provisioning is the enemy of cost efficiency. Workloads have peaks and troughs. Paying peak prices around the clock is how budgets spiral.

Kubernetes offers several autoscaling mechanisms, and most teams underuse them. For a broader look at how autoscaling decisions affect your bill, see Auto-Scaling Strategies That Actually Reduce Cloud Spend.

Tool Level What It Does Best For
HPA Pod Scales replica count on CPU / memory Stateless APIs, web services
VPA Pod Rightsizes CPU and memory per pod Any workload with stable patterns
KEDA Pod Scales on external events (queues, cron, Kafka lag) Batch, async, event-driven workloads
Karpenter Node Provisions right-sized nodes in 30–60 sec, consolidates idle nodes EKS clusters (replaces Cluster Autoscaler)
Cluster Autoscaler Node Adds / removes nodes on demand (3–5 min) GKE, AKS, or non-EKS environments

HPA for Traffic-Driven Workloads

The Horizontal Pod Autoscaler scales the number of pod replicas up or down based on CPU, memory, or custom metrics. It runs a control loop every fifteen seconds. If average CPU across your pods is at 90% against a target of 50%, it recalculates the desired replica count and adjusts.

HPA works best for stateless web services and APIs where traffic drives resource consumption predictably. It does not help with workloads that are idle between batches or driven by queue depth rather than CPU.

KEDA for Event-Driven Workloads

Kubernetes Event-Driven Autoscaling (KEDA) fills the gap HPA cannot. Where HPA scales on CPU and memory, KEDA scales on external event sources: Kafka consumer lag, SQS queue depth, Prometheus metrics, cron schedules, database query results, and more than sixty other sources.

If you process background jobs, run analytics pipelines, or handle asynchronous message processing, KEDA gives you something HPA cannot: the ability to scale based on what actually matters to your application, not just CPU utilization as a proxy.

The practical benefit here is that workloads can scale to zero replicas during idle periods and scale back up when events arrive. That alone can cut costs significantly for batch processing environments.

Karpenter for Node-Level Efficiency

Karpenter replaced the legacy Cluster Autoscaler as the recommended approach for AWS EKS and is broadly considered the better option where it is available. The key differences are speed and intelligence.

Karpenter provisions nodes in 30 to 60 seconds compared to 3 to 5 minutes for Cluster Autoscaler. More importantly, it selects the optimal instance type from a pool of candidates based on what the pending pods actually need, rather than scaling a fixed node group. A CPU-bound pod gets a compute-optimized instance. A memory-bound pod gets a memory-optimized one.

Karpenter also runs active consolidation: it identifies underutilized nodes, reschedules their pods onto fewer nodes, and terminates the empties. Enable the WhenEmptyOrUnderutilized consolidation policy and watch your node count naturally trend lower without any manual intervention.

🎧
DataStorage.com Podcast
We covered this in depth: Ep 3 — IONOS Challenging Hyperscalers with $4.99/TB Object Storage — a conversation on cloud cost pressures, hyperscaler alternatives, and what aggressive pricing means for enterprise infrastructure decisions.
Listen to the Episode →

Spot and Preemptible Instances Without the Risk

Spot instances on AWS cost 60 to 90% less than on-demand pricing. Azure Spot VMs and GCP Preemptible/Spot VMs offer comparable discounts. The catch is that the cloud provider can reclaim them with short notice. For a detailed cost breakdown across instance types, see Reserved vs On-Demand vs Spot Instances: A Cost Breakdown.

Most teams avoid spot instances because they worry about disruptions. That worry is well-founded for the wrong workloads and largely unnecessary for the right ones.

Workload Type Recommended Instance Rationale
Critical APIs On-Demand (always) Cannot tolerate interruption
Stateful services On-Demand (always) State loss risk is too high
Stateless web pods Mix: On-Demand + Spot HPA and redundancy absorb interruptions
Batch processing Spot instances Jobs can retry; 60–90% cheaper
ML training jobs Spot instances Checkpointing handles preemption
Dev / staging clusters Spot instances Non-critical; saves the most money

A real-world example from a transaction-heavy application shows what the results look like in practice: after implementing Karpenter with Spot Instances on AWS, one organization achieved a 35% reduction in monthly EC2 spend with no degradation in application performance, within three months of adoption.

The key to making spot instances work safely in production is diversification. Karpenter can spread workloads across multiple instance families (M, C, and R series on AWS) and multiple availability zones. When one spot pool runs dry, pods reschedule onto available capacity automatically.

The Node Termination Handler is the other piece. It listens for spot interruption notices and gracefully drains pods before the instance is reclaimed, giving your application time to finish in-flight requests before moving on.

FREE TOOL
Compare On-Demand vs Spot Pricing Across Providers

Our Cloud Cost Calculator shows real compute pricing across AWS, Azure, GCP and more — including Spot and Reserved tiers side by side.

Try the Free Calculator →

Namespace Quotas and the Culture of Cost Ownership

Technical tooling only goes so far. The clusters that stay efficient over time are the ones where teams feel responsible for their own spend.

Resource quotas enforce a ceiling at the namespace level, preventing any single team from consuming runaway resources without realizing it.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-budget-quota
  namespace: my-team
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "50"

Pair quotas with LimitRanges that set default requests and limits for containers that forget to specify them. Without a LimitRange, a pod with no resource requests can consume an entire node and the scheduler has no way to account for it.

Beyond the YAML, what actually changes behavior is making spend visible to the people making deployment decisions. Weekly cost emails to team leads, per-namespace dashboards in your monitoring stack, and surfacing cost data inside CI/CD pipelines so engineers see the cost implication of a new service before it ships are all effective. Cost-aware engineering does not require deep financial knowledge. It requires timely feedback.

Storage and Networking: The Hidden Line Items

Compute gets most of the attention, but storage and networking costs quietly accumulate in the background.

Persistent Volume Hygiene

Persistent volumes attached to deleted pods continue to accrue charges. Running a regular audit of unattached PVCs and cleaning them up is straightforward but often skipped. Tools like kubectl get pvc --all-namespaces combined with a script that flags PVCs not mounted by any running pod can automate this check.

On AWS, switching from gp2 to gp3 volumes for most workloads gives better price-performance at lower cost. Move infrequently accessed data to cold storage tiers. Use regional persistent disks only when a workload genuinely requires multi-zone redundancy.

Cutting Unnecessary Network Costs

Data transfer costs accrue every time traffic crosses availability zone boundaries. Topology-aware routing in Kubernetes (available since v1.23) can reduce cross-zone traffic by preferring pods in the same zone. For most stateless workloads, this is a straightforward configuration change.

Internal services that do not need external access should use ClusterIP instead of LoadBalancer. Each cloud load balancer carries a fixed monthly cost regardless of traffic volume. In large clusters with dozens of internal services, this adds up to real money.

The Tools Worth Knowing

You do not need every tool on this list, but it helps to know what exists before reaching for a solution.

OpenCost

Free, CNCF-hosted. Real-time cost allocation by namespace, deployment, and label. Best starting point for any team that wants visibility without spending on tooling.

Kubecost

Builds on OpenCost with a richer UI, savings recommendations, and multi-cluster support. The free tier covers a single cluster.

Karpenter

Node-level autoscaling on EKS with intelligent instance selection and active consolidation. Recommended replacement for Cluster Autoscaler on AWS.

Goldilocks

Runs VPA in recommendation mode and surfaces suggestions via a dashboard. Low overhead and useful for teams not ready for VPA auto mode cluster-wide.

KEDA

Event-driven autoscaling for any Kubernetes cluster. Essential for batch, queue-driven, or schedule-driven workloads where HPA falls short.

A Practical Starting Sequence

Rather than trying to implement everything at once, the following order tends to produce the fastest measurable results with the least disruption.

Weeks 1–2

Install OpenCost or Kubecost. Get per-namespace cost numbers. Identify the top 10 most expensive workloads.

Weeks 3–4

Enable VPA in recommendation mode. Review suggestions for the high-cost workloads. Apply changes to stateless services first.

Weeks 5–6

Deploy Karpenter with consolidation enabled. Configure Spot instances for batch and stateless workloads.

Weeks 7–8

Set ResourceQuotas and LimitRanges on every namespace. Configure KEDA for queue-driven or event-driven workloads.

Ongoing

Weekly cost review emails to team leads. Quarterly audit of PVCs, load balancers, and idle namespaces. Cost checks integrated into CI/CD pipelines so new services are right-sized before they deploy.

The clusters that stay efficient treat cost hygiene the way strong teams treat security: automated checks, regular audits, and a named owner for every resource.

🎧
DataStorage.com Podcast
We covered this in depth: Ep 1 — Rewriting the Cloud Playbook with Backblaze CEO Gleb Budman — a candid conversation on egress fees, cloud cost bloat, vendor lock-in, and how enterprises can take back control of their infrastructure spend.
Listen to the Episode →

Closing Thoughts

Kubernetes waste is not a failure of technology. It is a failure of visibility and habit. The tools to fix it are mature, many of them are open source, and the savings are real. Organizations that implement these practices cut Kubernetes costs by 30 to 50% without performance trade-offs, according to documented industry results.

The performance concern is understandable but largely unfounded when approached correctly. Rightsizing based on actual usage data, autoscaling that responds to real demand signals, and spot instances with proper interruption handling are all built for production environments. The risk is not in optimizing. The risk is in continuing to pay for resources that are sitting idle.

Start with visibility. Let the data tell you where the waste is. Then fix it in order of impact.

The risk is not in optimizing. The risk is in continuing to pay for resources that are sitting idle.

Weekly Newsletter
Stay Ahead in Cloud Infrastructure

Join 1,200+ CTOs, architects, and cloud professionals who get our weekly briefing on storage strategy, GPU compute, and cloud cost intelligence.

Subscribe Free →
References

Share this article

🔍 Browse by categories

Free Cloud Cost Calculator

Compare AWS, Google Cloud, Azure, and alternatives like Backblaze B2 Discover how much you could save in seconds

🔥 Trending Articles

Newsletter

Stay Ahead in Cloud
& Data Infrastructure

Get early access to new tools, insights, and research shaping the next wave of cloud and storage innovation.