intermediate 30 min read finops-cost-management Updated: 2025-10-11

Kubernetes Cost Optimization: AWS EKS vs GKE vs AKS (2025)

A comprehensive guide to reducing Kubernetes costs across AWS EKS, Google GKE, and Azure AKS. Learn rightsizing, autoscaling, spot instances, storage optimization, and cloud-specific cost strategies.

The Bottom Line

Most Kubernetes clusters waste 40-60% of their compute spend on overprovisioned resources. This guide provides actionable strategies to cut costs by 50% or more across AWS EKS, GCP GKE, and Azure AKS without sacrificing performance.

The K8s Cost Problem: Why Clusters Get Expensive

Kubernetes provides incredible flexibility, but that flexibility comes with cost complexity:

💰

Overprovisioning

Teams request 4 CPU / 8GB RAM "to be safe" but only use 0.5 CPU / 2GB. Result: 87% waste.

🔄

Idle Resources

Dev/staging clusters run 24/7 even though they're only used 40 hours per week. Result: 76% idle time.

📊

Visibility Gaps

No one knows which team or app is driving costs. Result: no accountability, no optimization.

Industry benchmark: The average Kubernetes cluster operates at 20-30% CPU utilization and 30-40% memory utilization. This means you're paying for 2-3x more capacity than you need.

Universal Cost Optimization Strategies (All Clouds)

These strategies work regardless of whether you're using EKS, GKE, or AKS:

1. Right-Size Pod Resource Requests and Limits

The Problem

Most teams cargo-cult resource requests from Stack Overflow without measuring actual usage. A pod requesting 2 CPU might only use 200m (10% utilization).

The Solution

# Before: Overprovisioned
apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      requests:
        cpu: "2000m"      # Requesting 2 CPUs
        memory: "4Gi"     # Requesting 4GB
      limits:
        cpu: "4000m"      # Limit at 4 CPUs
        memory: "8Gi"     # Limit at 8GB

# After: Right-sized based on metrics
apiVersion: v1
kind: Pod
metadata:
  name: webapp
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      requests:
        cpu: "500m"       # Actually using 300-400m
        memory: "1Gi"     # Actually using 600-800Mi
      limits:
        cpu: "1000m"      # 2x headroom for spikes
        memory: "2Gi"     # 2x headroom for spikes

Impact: Reduced CPU request by 75%, memory by 75%. This allows 4x more pods on the same nodes.

How to Right-Size

Use Vertical Pod Autoscaler (VPA) in recommendation mode:

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Create VPA in recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: webapp
  updateMode: "Off"  # Recommendation only

# Get recommendations
kubectl describe vpa webapp-vpa

2. Implement Cluster Autoscaling

Horizontal Pod Autoscaler (HPA)

Scale pods based on CPU, memory, or custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Cluster Autoscaler

Scale nodes based on pending pods. Configuration varies by cloud (see EKS/GKE/AKS sections).

Impact: Automatically add nodes during traffic spikes, remove them during low usage. Can reduce costs by 30-50% for variable workloads.

3. Use Spot/Preemptible Instances

Cost Savings

  • AWS Spot: Up to 90% cheaper than On-Demand
  • GCP Preemptible: Up to 80% cheaper
  • Azure Spot: Up to 90% cheaper

Best Practices

  • Use for stateless workloads: Web servers, batch jobs, CI/CD workers
  • Mix with on-demand: Run critical services on on-demand, everything else on spot
  • Use node taints/tolerations: Prevent critical pods from landing on spot nodes
  • Diversify instance types: Use multiple spot pools to reduce interruption risk
# Taint spot nodes
kubectl taint nodes spot-node-1 spot=true:NoSchedule

# Tolerate spot in deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-worker
spec:
  template:
    spec:
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      nodeSelector:
        node-type: spot

4. Optimize Storage Costs

Common Storage Waste

  • Orphaned volumes: PVCs deleted but underlying volumes remain (costs $$$)
  • Wrong storage class: Using high-performance SSD for logs/backups
  • Oversized volumes: 500GB volumes for 10GB of actual data

Storage Class Best Practices

# Use cheaper storage tiers for appropriate workloads
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: logs-pvc
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: standard  # Use standard/HDD for logs
  resources:
    requests:
      storage: 20Gi

# Use premium only for databases
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: premium-ssd  # Fast storage for DB
  resources:
    requests:
      storage: 100Gi

Impact: Standard storage is 3-5x cheaper than premium SSD. Use it for logs, backups, and non-critical data.

5. Implement Resource Quotas and LimitRanges

Prevent Runaway Costs

# Namespace-level quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: development
spec:
  hard:
    requests.cpu: "50"
    requests.memory: "100Gi"
    limits.cpu: "100"
    limits.memory: "200Gi"
    persistentvolumeclaims: "10"

# Default limits for pods without resources
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container

AWS EKS-Specific Cost Optimizations

1. Use AWS Savings Plans for EKS Compute

Compute Savings Plans

Commit to consistent compute usage for 1-3 years for up to 72% discount:

  • 1-year no upfront: ~40% savings
  • 3-year all upfront: ~72% savings

Best practice: Cover your baseline 24/7 workloads with Savings Plans, use spot for burst capacity.

2. Use Fargate Spot for Batch Workloads

Fargate Spot Pricing

Up to 70% cheaper than regular Fargate. Perfect for:

  • CI/CD pipelines
  • Data processing jobs
  • Machine learning training
# Fargate profile with spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-east-1
fargateProfiles:
  - name: batch-spot
    selectors:
      - namespace: batch
        labels:
          workload-type: spot-ok
    capacityType: SPOT

3. Enable EKS Cluster Autoscaler with Mixed Instance Policies

Auto Scaling Group Configuration

resource "aws_autoscaling_group" "eks_nodes" {
  name = "eks-mixed-nodes"
  
  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 2
      on_demand_percentage_above_base_capacity = 20
      spot_instance_pools                      = 4
      spot_allocation_strategy                 = "capacity-optimized"
    }
    
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.eks.id
        version           = "$Latest"
      }
      
      override {
        instance_type = "m5.large"
      }
      override {
        instance_type = "m5a.large"
      }
      override {
        instance_type = "m6i.large"
      }
      override {
        instance_type = "m6a.large"
      }
    }
  }
}

Impact: 2 on-demand nodes for stability, 80% spot for cost savings. Diversify across 4 instance types to reduce interruption risk.

4. Use Graviton Instances (ARM)

Cost and Performance Benefits

  • 20% cheaper than comparable x86 instances
  • 40% better price-performance for many workloads

Supported: Most EKS versions support Graviton (ARM64). Just ensure your container images support linux/arm64.

# Build multi-arch images
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .

5. Optimize EKS Control Plane Costs

Control Plane Pricing

AWS charges $0.10/hour ($73/month) per EKS cluster. Strategies:

  • Multi-tenancy: Run multiple apps in one cluster using namespaces
  • Consolidate dev/staging: Use one cluster with different namespaces instead of separate clusters
  • Delete idle clusters: Shut down dev/test clusters overnight

Impact: Reducing from 10 clusters to 3 saves $510/month in control plane costs alone.

GCP GKE-Specific Cost Optimizations

1. Enable GKE Autopilot Mode

What is Autopilot?

GKE manages nodes, autoscaling, and security. You only pay for pod resource requests (not idle capacity).

Cost Model

Pay only for CPU, memory, and ephemeral storage requested by pods. No wasted capacity on overprovisioned nodes.

When to Use

  • Variable workloads with unpredictable scaling
  • Teams without dedicated K8s expertise
  • Workloads that don't need node-level customization
# Create Autopilot cluster
gcloud container clusters create-auto my-cluster \
    --region=us-central1 \
    --release-channel=regular

Impact: Can reduce costs by 30-50% for variable workloads by eliminating idle node capacity.

2. Use Spot VMs (Preemptible)

GKE Spot Configuration

# Create node pool with spot VMs
gcloud container node-pools create spot-pool \
    --cluster=my-cluster \
    --spot \
    --machine-type=n2-standard-4 \
    --num-nodes=3 \
    --enable-autoscaling \
    --min-nodes=1 \
    --max-nodes=10 \
    --zone=us-central1-a

Pricing: Up to 80% discount vs regular VMs. Max runtime: 24 hours (then VM is terminated).

3. Enable Cost Allocation and GKE Usage Metering

Enable Usage Metering

gcloud container clusters update my-cluster \
    --enable-resource-consumption-metering \
    --resource-consumption-bigquery-dataset=gke_usage

Exports detailed resource consumption to BigQuery for chargeback and cost analysis by namespace, label, or pod.

4. Use Committed Use Discounts (CUDs)

GCP Committed Use Pricing

  • 1-year commitment: ~37% discount
  • 3-year commitment: ~55% discount

Applied automatically to matching VM usage. No need to configure per-cluster.

5. Use E2 Instances for Non-Critical Workloads

E2 Cost Advantage

E2 instances are 30-40% cheaper than N1/N2 for similar performance. Great for:

  • Development environments
  • Staging clusters
  • Batch processing
  • Web servers with predictable load

Azure AKS-Specific Cost Optimizations

1. Use Azure Reserved Instances

Reserved VM Pricing

  • 1-year reservation: ~40% savings
  • 3-year reservation: ~62% savings

Reserve capacity for your baseline AKS node pools. Use spot for burst capacity.

2. Enable AKS Cluster Autoscaler with Spot Node Pools

Create Spot Node Pool

# Create spot node pool
az aks nodepool add \
    --cluster-name myAKSCluster \
    --resource-group myResourceGroup \
    --name spotnodepool \
    --priority Spot \
    --eviction-policy Delete \
    --spot-max-price -1 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 10 \
    --node-count 3

Pricing: Up to 90% discount. Set spot-max-price to -1 to pay current spot rate (never more than on-demand).

3. Use Virtual Nodes (Azure Container Instances)

What are Virtual Nodes?

Serverless burst capacity using Azure Container Instances (ACI). Pay per second for pods that run on virtual nodes.

Best Use Cases

  • Burst workloads (CI/CD jobs)
  • Event-driven processing
  • Batch jobs
# Enable virtual nodes
az aks enable-addons \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --addons virtual-node \
    --subnet-name VirtualNodeSubnet

4. Optimize with Azure Hybrid Benefit

Windows Node Savings

If you have Windows Server licenses with Software Assurance, you can use them for AKS Windows node pools at no additional licensing cost.

Savings: Up to 40% on Windows node compute costs.

5. Use B-Series Burstable VMs for Dev/Test

B-Series Benefits

B-series VMs are 40-60% cheaper than general-purpose VMs. Perfect for:

  • Development clusters
  • Test environments
  • Low-traffic applications

Note: Not suitable for production workloads with sustained CPU usage.

Cost Comparison: EKS vs GKE vs AKS

Cost Factor AWS EKS GCP GKE Azure AKS
Control Plane $73/month per cluster $73/month per cluster (free for Autopilot) FREE
Worker Nodes Standard EC2 pricing Standard Compute Engine pricing Standard VM pricing
Spot/Preemptible Discount Up to 90% Up to 80% Up to 90%
Reserved/Committed Savings Up to 72% (Savings Plans) Up to 55% (CUDs) Up to 62% (Reserved)
Serverless Option Fargate (premium pricing) Autopilot (pay per pod) Virtual Nodes (ACI, premium)
ARM Support Yes (Graviton, 20% cheaper) Yes (Tau T2A, similar savings) Limited availability
Cost Allocation Tags + 3rd party tools Built-in usage metering to BigQuery Tags + Cost Management
Cheapest for Dev/Test Spot + Fargate Spot Spot + E2 instances Spot + B-series VMs
Best for Variable Workloads Fargate Spot + Karpenter GKE Autopilot Virtual Nodes + Spot

Example: 100-Node Cluster Cost Comparison

Assumptions

  • Instance type: 4 vCPU, 16GB RAM (m5.xlarge / n2-standard-4 / D4s_v3)
  • 100 nodes total
  • 70% baseline (reserved), 30% burst (spot)
  • 24/7 operation

Monthly Cost Breakdown

Provider Control Plane 70 Reserved Nodes 30 Spot Nodes Total/Month
AWS EKS $73 $6,580 (3yr Savings Plan) $670 (90% off) $7,323
GCP GKE $73 $6,840 (3yr CUD) $730 (80% off) $7,643
Azure AKS $0 $6,650 (3yr Reserved) $685 (90% off) $7,335

Winner for this scenario: AWS EKS by a small margin, but all three are within 5% of each other when optimized properly.

Cost Monitoring Tools and Automation

Open Source Tools

Kubecost (Open Source)

Best for: Detailed cost visibility by namespace, label, pod.

helm install kubecost \
  --repo https://kubecost.github.io/cost-analyzer/ \
  cost-analyzer \
  --namespace kubecost \
  --create-namespace

OpenCost (CNCF)

Best for: Real-time cost monitoring with Prometheus.

kubectl apply -f \
  https://raw.githubusercontent.com/opencost/opencost/develop/kubernetes/opencost.yaml

Goldilocks

Best for: VPA recommendations dashboard.

helm repo add fairwinds-stable \
  https://charts.fairwinds.com/stable
helm install goldilocks \
  fairwinds-stable/goldilocks \
  --namespace goldilocks

Cloud-Native Tools

  • AWS Cost Explorer: Tag-based cost allocation for EKS
  • GCP Cost Management: Built-in GKE usage metering
  • Azure Cost Management: Resource tagging and cost analysis

Commercial Tools

  • Kubecost Enterprise: Multi-cluster cost optimization
  • CAST AI: AI-powered autoscaling and optimization
  • Spot by NetApp: Advanced spot instance management
  • Datadog: Cost monitoring integrated with observability

Your 30-Day K8s Cost Optimization Plan

  • Week 1: Install Kubecost/OpenCost. Identify top cost drivers by namespace and label.
  • Week 2: Right-size pods using VPA recommendations. Start with top 10 most expensive deployments.
  • Week 3: Enable cluster autoscaler. Create spot/preemptible node pools for non-critical workloads.
  • Week 4: Implement resource quotas per namespace. Clean up orphaned volumes. Enable Savings Plans/CUDs.

Expected savings: 40-60% reduction in monthly Kubernetes spend.