On every EKS cost audit we run, the same handful of levers explain 80% of the savings. None of them are clever. All of them are deferred because they require coordination across application, platform and finance teams. Here's the short list, in roughly the order we apply them.
1. Right-size requests, not limits
CPU and memory requests are what the scheduler reserves. Most teams set them by copy-pasting from another service that was already wrong. Pull actual usage from Prometheus over a representative two-week window and resize to the p95 plus a margin. This single change typically reclaims 20–30% of cluster capacity.
2. Replace the cluster autoscaler with Karpenter
Karpenter picks instance types per pending pod rather than scaling a fixed node group. The result is denser packing, faster scale-out and access to a much wider menu of instance shapes. We see 15–25% additional savings on top of right-sizing.
3. Move stateless workloads to Spot
Karpenter makes Spot trivial: define a node pool with Spot first, On-Demand fallback, and run a pod disruption budget per workload. Most stateless services tolerate two-minute interruptions without breaking SLOs. Spot discounts are 60–90% off On-Demand.
4. Savings Plans on the steady-state baseline
Once Spot is in place, measure your On-Demand baseline over 30 days and buy a Compute Savings Plan covering 60–80% of it. Leave headroom for Spot fluctuations. This is the boring lever, and it's worth real money.
5. Trim the platform overhead
Most clusters quietly run two log forwarders, three monitoring agents and an abandoned service mesh. Audit DaemonSets and `kube-system` namespaces — they sit on every node and add up fast.
6. EBS and data transfer
Switch gp2 volumes to gp3, delete unattached volumes (there are always unattached volumes), and route cross-AZ chatter through topology-aware routing. NAT Gateway traffic is the silent line item nobody investigates.
7. Make cost a service-level metric
Tag every workload by team, expose monthly cost-per-service in the same dashboard as latency and error rate, and review it in the same operational review. Cost regressions get fixed when they're treated like reliability regressions — and not before.

