Kubernetes Cost Optimization: AWS EKS vs GKE vs AKS (2025)
A comprehensive guide to reducing Kubernetes costs across AWS EKS, Google GKE, and Azure AKS. Learn rightsizing, autoscaling, spot instances, storage optimization, and cloud-specific cost strategies.
What You'll Learn
The Bottom Line
Most Kubernetes clusters waste 40-60% of their compute spend on overprovisioned resources. This guide provides actionable strategies to cut costs by 50% or more across AWS EKS, GCP GKE, and Azure AKS without sacrificing performance.
The K8s Cost Problem: Why Clusters Get Expensive
Kubernetes provides incredible flexibility, but that flexibility comes with cost complexity:
Overprovisioning
Teams request 4 CPU / 8GB RAM "to be safe" but only use 0.5 CPU / 2GB. Result: 87% waste.
Idle Resources
Dev/staging clusters run 24/7 even though they're only used 40 hours per week. Result: 76% idle time.
Visibility Gaps
No one knows which team or app is driving costs. Result: no accountability, no optimization.
Industry benchmark: The average Kubernetes cluster operates at 20-30% CPU utilization and 30-40% memory utilization. This means you're paying for 2-3x more capacity than you need.
Universal Cost Optimization Strategies (All Clouds)
These strategies work regardless of whether you're using EKS, GKE, or AKS:
1. Right-Size Pod Resource Requests and Limits
The Problem
Most teams cargo-cult resource requests from Stack Overflow without measuring actual usage. A pod requesting 2 CPU might only use 200m (10% utilization).
The Solution
# Before: Overprovisioned
apiVersion: v1
kind: Pod
metadata:
name: webapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "2000m" # Requesting 2 CPUs
memory: "4Gi" # Requesting 4GB
limits:
cpu: "4000m" # Limit at 4 CPUs
memory: "8Gi" # Limit at 8GB
# After: Right-sized based on metrics
apiVersion: v1
kind: Pod
metadata:
name: webapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "500m" # Actually using 300-400m
memory: "1Gi" # Actually using 600-800Mi
limits:
cpu: "1000m" # 2x headroom for spikes
memory: "2Gi" # 2x headroom for spikes Impact: Reduced CPU request by 75%, memory by 75%. This allows 4x more pods on the same nodes.
How to Right-Size
Use Vertical Pod Autoscaler (VPA) in recommendation mode:
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
# Create VPA in recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: webapp-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: webapp
updateMode: "Off" # Recommendation only
# Get recommendations
kubectl describe vpa webapp-vpa 2. Implement Cluster Autoscaling
Horizontal Pod Autoscaler (HPA)
Scale pods based on CPU, memory, or custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 Cluster Autoscaler
Scale nodes based on pending pods. Configuration varies by cloud (see EKS/GKE/AKS sections).
Impact: Automatically add nodes during traffic spikes, remove them during low usage. Can reduce costs by 30-50% for variable workloads.
3. Use Spot/Preemptible Instances
Cost Savings
- AWS Spot: Up to 90% cheaper than On-Demand
- GCP Preemptible: Up to 80% cheaper
- Azure Spot: Up to 90% cheaper
Best Practices
- Use for stateless workloads: Web servers, batch jobs, CI/CD workers
- Mix with on-demand: Run critical services on on-demand, everything else on spot
- Use node taints/tolerations: Prevent critical pods from landing on spot nodes
- Diversify instance types: Use multiple spot pools to reduce interruption risk
# Taint spot nodes
kubectl taint nodes spot-node-1 spot=true:NoSchedule
# Tolerate spot in deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-worker
spec:
template:
spec:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
nodeSelector:
node-type: spot 4. Optimize Storage Costs
Common Storage Waste
- Orphaned volumes: PVCs deleted but underlying volumes remain (costs $$$)
- Wrong storage class: Using high-performance SSD for logs/backups
- Oversized volumes: 500GB volumes for 10GB of actual data
Storage Class Best Practices
# Use cheaper storage tiers for appropriate workloads
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logs-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard # Use standard/HDD for logs
resources:
requests:
storage: 20Gi
# Use premium only for databases
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: premium-ssd # Fast storage for DB
resources:
requests:
storage: 100Gi Impact: Standard storage is 3-5x cheaper than premium SSD. Use it for logs, backups, and non-critical data.
5. Implement Resource Quotas and LimitRanges
Prevent Runaway Costs
# Namespace-level quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: development
spec:
hard:
requests.cpu: "50"
requests.memory: "100Gi"
limits.cpu: "100"
limits.memory: "200Gi"
persistentvolumeclaims: "10"
# Default limits for pods without resources
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: development
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container AWS EKS-Specific Cost Optimizations
1. Use AWS Savings Plans for EKS Compute
Compute Savings Plans
Commit to consistent compute usage for 1-3 years for up to 72% discount:
- 1-year no upfront: ~40% savings
- 3-year all upfront: ~72% savings
Best practice: Cover your baseline 24/7 workloads with Savings Plans, use spot for burst capacity.
2. Use Fargate Spot for Batch Workloads
Fargate Spot Pricing
Up to 70% cheaper than regular Fargate. Perfect for:
- CI/CD pipelines
- Data processing jobs
- Machine learning training
# Fargate profile with spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
region: us-east-1
fargateProfiles:
- name: batch-spot
selectors:
- namespace: batch
labels:
workload-type: spot-ok
capacityType: SPOT 3. Enable EKS Cluster Autoscaler with Mixed Instance Policies
Auto Scaling Group Configuration
resource "aws_autoscaling_group" "eks_nodes" {
name = "eks-mixed-nodes"
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 2
on_demand_percentage_above_base_capacity = 20
spot_instance_pools = 4
spot_allocation_strategy = "capacity-optimized"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.eks.id
version = "$Latest"
}
override {
instance_type = "m5.large"
}
override {
instance_type = "m5a.large"
}
override {
instance_type = "m6i.large"
}
override {
instance_type = "m6a.large"
}
}
}
} Impact: 2 on-demand nodes for stability, 80% spot for cost savings. Diversify across 4 instance types to reduce interruption risk.
4. Use Graviton Instances (ARM)
Cost and Performance Benefits
- 20% cheaper than comparable x86 instances
- 40% better price-performance for many workloads
Supported: Most EKS versions support Graviton (ARM64). Just ensure your container images support linux/arm64.
# Build multi-arch images
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push . 5. Optimize EKS Control Plane Costs
Control Plane Pricing
AWS charges $0.10/hour ($73/month) per EKS cluster. Strategies:
- Multi-tenancy: Run multiple apps in one cluster using namespaces
- Consolidate dev/staging: Use one cluster with different namespaces instead of separate clusters
- Delete idle clusters: Shut down dev/test clusters overnight
Impact: Reducing from 10 clusters to 3 saves $510/month in control plane costs alone.
GCP GKE-Specific Cost Optimizations
1. Enable GKE Autopilot Mode
What is Autopilot?
GKE manages nodes, autoscaling, and security. You only pay for pod resource requests (not idle capacity).
Cost Model
Pay only for CPU, memory, and ephemeral storage requested by pods. No wasted capacity on overprovisioned nodes.
When to Use
- Variable workloads with unpredictable scaling
- Teams without dedicated K8s expertise
- Workloads that don't need node-level customization
# Create Autopilot cluster
gcloud container clusters create-auto my-cluster \
--region=us-central1 \
--release-channel=regular Impact: Can reduce costs by 30-50% for variable workloads by eliminating idle node capacity.
2. Use Spot VMs (Preemptible)
GKE Spot Configuration
# Create node pool with spot VMs
gcloud container node-pools create spot-pool \
--cluster=my-cluster \
--spot \
--machine-type=n2-standard-4 \
--num-nodes=3 \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--zone=us-central1-a Pricing: Up to 80% discount vs regular VMs. Max runtime: 24 hours (then VM is terminated).
3. Enable Cost Allocation and GKE Usage Metering
Enable Usage Metering
gcloud container clusters update my-cluster \
--enable-resource-consumption-metering \
--resource-consumption-bigquery-dataset=gke_usage Exports detailed resource consumption to BigQuery for chargeback and cost analysis by namespace, label, or pod.
4. Use Committed Use Discounts (CUDs)
GCP Committed Use Pricing
- 1-year commitment: ~37% discount
- 3-year commitment: ~55% discount
Applied automatically to matching VM usage. No need to configure per-cluster.
5. Use E2 Instances for Non-Critical Workloads
E2 Cost Advantage
E2 instances are 30-40% cheaper than N1/N2 for similar performance. Great for:
- Development environments
- Staging clusters
- Batch processing
- Web servers with predictable load
Azure AKS-Specific Cost Optimizations
1. Use Azure Reserved Instances
Reserved VM Pricing
- 1-year reservation: ~40% savings
- 3-year reservation: ~62% savings
Reserve capacity for your baseline AKS node pools. Use spot for burst capacity.
2. Enable AKS Cluster Autoscaler with Spot Node Pools
Create Spot Node Pool
# Create spot node pool
az aks nodepool add \
--cluster-name myAKSCluster \
--resource-group myResourceGroup \
--name spotnodepool \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10 \
--node-count 3 Pricing: Up to 90% discount. Set spot-max-price to -1 to pay current spot rate (never more than on-demand).
3. Use Virtual Nodes (Azure Container Instances)
What are Virtual Nodes?
Serverless burst capacity using Azure Container Instances (ACI). Pay per second for pods that run on virtual nodes.
Best Use Cases
- Burst workloads (CI/CD jobs)
- Event-driven processing
- Batch jobs
# Enable virtual nodes
az aks enable-addons \
--resource-group myResourceGroup \
--name myAKSCluster \
--addons virtual-node \
--subnet-name VirtualNodeSubnet 4. Optimize with Azure Hybrid Benefit
Windows Node Savings
If you have Windows Server licenses with Software Assurance, you can use them for AKS Windows node pools at no additional licensing cost.
Savings: Up to 40% on Windows node compute costs.
5. Use B-Series Burstable VMs for Dev/Test
B-Series Benefits
B-series VMs are 40-60% cheaper than general-purpose VMs. Perfect for:
- Development clusters
- Test environments
- Low-traffic applications
Note: Not suitable for production workloads with sustained CPU usage.
Cost Comparison: EKS vs GKE vs AKS
| Cost Factor | AWS EKS | GCP GKE | Azure AKS |
|---|---|---|---|
| Control Plane | $73/month per cluster | $73/month per cluster (free for Autopilot) | FREE |
| Worker Nodes | Standard EC2 pricing | Standard Compute Engine pricing | Standard VM pricing |
| Spot/Preemptible Discount | Up to 90% | Up to 80% | Up to 90% |
| Reserved/Committed Savings | Up to 72% (Savings Plans) | Up to 55% (CUDs) | Up to 62% (Reserved) |
| Serverless Option | Fargate (premium pricing) | Autopilot (pay per pod) | Virtual Nodes (ACI, premium) |
| ARM Support | Yes (Graviton, 20% cheaper) | Yes (Tau T2A, similar savings) | Limited availability |
| Cost Allocation | Tags + 3rd party tools | Built-in usage metering to BigQuery | Tags + Cost Management |
| Cheapest for Dev/Test | Spot + Fargate Spot | Spot + E2 instances | Spot + B-series VMs |
| Best for Variable Workloads | Fargate Spot + Karpenter | GKE Autopilot | Virtual Nodes + Spot |
Example: 100-Node Cluster Cost Comparison
Assumptions
- Instance type: 4 vCPU, 16GB RAM (m5.xlarge / n2-standard-4 / D4s_v3)
- 100 nodes total
- 70% baseline (reserved), 30% burst (spot)
- 24/7 operation
Monthly Cost Breakdown
| Provider | Control Plane | 70 Reserved Nodes | 30 Spot Nodes | Total/Month |
|---|---|---|---|---|
| AWS EKS | $73 | $6,580 (3yr Savings Plan) | $670 (90% off) | $7,323 |
| GCP GKE | $73 | $6,840 (3yr CUD) | $730 (80% off) | $7,643 |
| Azure AKS | $0 | $6,650 (3yr Reserved) | $685 (90% off) | $7,335 |
Winner for this scenario: AWS EKS by a small margin, but all three are within 5% of each other when optimized properly.
Cost Monitoring Tools and Automation
Open Source Tools
Kubecost (Open Source)
Best for: Detailed cost visibility by namespace, label, pod.
helm install kubecost \
--repo https://kubecost.github.io/cost-analyzer/ \
cost-analyzer \
--namespace kubecost \
--create-namespace OpenCost (CNCF)
Best for: Real-time cost monitoring with Prometheus.
kubectl apply -f \
https://raw.githubusercontent.com/opencost/opencost/develop/kubernetes/opencost.yaml Goldilocks
Best for: VPA recommendations dashboard.
helm repo add fairwinds-stable \
https://charts.fairwinds.com/stable
helm install goldilocks \
fairwinds-stable/goldilocks \
--namespace goldilocks Cloud-Native Tools
- AWS Cost Explorer: Tag-based cost allocation for EKS
- GCP Cost Management: Built-in GKE usage metering
- Azure Cost Management: Resource tagging and cost analysis
Commercial Tools
- Kubecost Enterprise: Multi-cluster cost optimization
- CAST AI: AI-powered autoscaling and optimization
- Spot by NetApp: Advanced spot instance management
- Datadog: Cost monitoring integrated with observability
Your 30-Day K8s Cost Optimization Plan
- Week 1: Install Kubecost/OpenCost. Identify top cost drivers by namespace and label.
- Week 2: Right-size pods using VPA recommendations. Start with top 10 most expensive deployments.
- Week 3: Enable cluster autoscaler. Create spot/preemptible node pools for non-critical workloads.
- Week 4: Implement resource quotas per namespace. Clean up orphaned volumes. Enable Savings Plans/CUDs.
Expected savings: 40-60% reduction in monthly Kubernetes spend.