25: Horizontal Pod Autoscaler

25: Horizontal Pod Autoscaler

Objective

Learn how the Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed CPU utilization or other metrics. Deploy a CPU-intensive application, create an HPA, generate load, and observe automatic scaling behavior.


Theory

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics such as CPU or memory utilization. It is one of the core autoscaling mechanisms in Kubernetes.

Key concepts:

Concept Description
Target metric The metric HPA monitors (e.g., CPU utilization at 50%)
Min/Max replicas Boundaries for scaling — HPA will never scale below min or above max
Scaling algorithm HPA calculates desired replicas: desired = ceil(current * (currentMetric / targetMetric))
Stabilization window Scale-up: no delay (immediate). Scale-down: 5-minute stabilization window. Prevents flapping
Metrics Server Required component that collects resource metrics from kubelets. Installed by default in AKS

How HPA Works

  1. The HPA controller queries the Metrics Server every 15 seconds (default)
  2. It compares the current metric value against the target
  3. If the current value exceeds the target, it scales up (adds replicas)
  4. If the current value is below the target, it scales down (removes replicas)
  5. A 5-minute stabilization window prevents rapid scale-down oscillation (scale-up has no delay)

VPA Overview (Vertical Pod Autoscaler)

While HPA scales horizontally (more Pods), the Vertical Pod Autoscaler (VPA) scales vertically — it adjusts the CPU and memory requests and limits of individual containers.

  • VPA monitors actual resource usage and recommends (or automatically applies) right-sized requests
  • VPA and HPA are complementary — use HPA for scaling replicas and VPA for right-sizing individual Pods
  • VPA should not be used with HPA on the same CPU/memory metric simultaneously

Metrics Server

The Metrics Server is a cluster-wide aggregator of resource usage data. It collects CPU and memory metrics from kubelets and exposes them through the Kubernetes Metrics API.

  • AKS: Metrics Server is installed by default — no additional setup required
  • Verify with: kubectl top nodes and kubectl top pods

Mermaid Diagram: HPA Monitoring Loop

flowchart LR
    A[Metrics Server] -->|Collects CPU/memory<br>from kubelets| B[HPA Controller]
    B -->|Every 15s:<br>query current metrics| A
    B -->|Compare current<br>vs target| C{Current > Target?}
    C -->|Yes| D[Scale Up<br>Add replicas]
    C -->|No| E{Current < Target?}
    E -->|Yes| F[Scale Down<br>Remove replicas<br>after 5 min stabilization]
    E -->|No| G[No change]
    D --> H[Deployment<br>adjusts replica count]
    F --> H
    G --> B
    H --> B

Practical Tasks

All tasks should be performed in your namespace student-XX. Replace XX with your student number throughout.

Task 1: Deploy a CPU-Intensive Application

Deploy the HPA example application, which is a simple PHP-based web server that performs CPU-intensive computations on each request.

Create a file hpa-demo.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-demo-XX
  namespace: student-XX
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hpa-demo-XX
  template:
    metadata:
      labels:
        app: hpa-demo-XX
    spec:
      containers:
      - name: app
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "200m"
          limits:
            cpu: "500m"

Apply it and expose with a Service:

kubectl apply -f hpa-demo.yaml -n student-XX
kubectl expose deployment hpa-demo-XX --port=80 -n student-XX

Verify the Pod is running and check initial CPU usage:

kubectl get pods -l app=hpa-demo-XX -n student-XX
kubectl top pods -l app=hpa-demo-XX -n student-XX

Note: kubectl top may take a minute to start showing metrics for newly created Pods.


Task 2: Create an HPA

Create an HPA that targets 50% average CPU utilization, with a minimum of 1 and maximum of 5 replicas:

kubectl autoscale deployment hpa-demo-XX --cpu-percent=50 --min=1 --max=5 -n student-XX

Verify the HPA was created:

kubectl get hpa hpa-demo-XX -n student-XX

You should see output similar to:

NAME          REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
hpa-demo-XX   Deployment/hpa-demo-XX   <unknown>/50%   1         5         1          10s

The <unknown> target is normal initially — it takes about 60 seconds for the Metrics Server to start reporting values.


Task 3: Generate Load

Open a second terminal and run a load generator that continuously sends HTTP requests to the application:

kubectl run -i --tty load-generator-XX --rm --image=busybox -n student-XX -- /bin/sh -c "while true; do wget -q -O- http://hpa-demo-XX; done"

This will create a temporary Pod that hammers the service with requests, driving up CPU usage.

Keep this terminal open and the load generator running for the next task.


Task 4: Observe Scale-Up

In your first terminal, watch the HPA and Pods in real time:

kubectl get hpa hpa-demo-XX -w -n student-XX

In a third terminal (or another tab), watch the Pods:

kubectl get pods -l app=hpa-demo-XX -w -n student-XX

Within 1-2 minutes, you should observe:

  • CPU utilization rising above 50%
  • HPA increasing the replica count
  • New Pods being created

The HPA will scale up until CPU utilization per Pod drops below the 50% target.


Task 5: Stop Load and Observe Scale-Down

  1. Go back to the second terminal and press Ctrl+C to stop the load generator
  2. Continue watching the HPA and Pods in the other terminals
  3. After about 5 minutes (the default scale-down cooldown), HPA will begin reducing the replica count
  4. Eventually, replicas will return to 1
kubectl get hpa hpa-demo-XX -n student-XX
kubectl get pods -l app=hpa-demo-XX -n student-XX

Cleanup

kubectl delete hpa hpa-demo-XX -n student-XX
kubectl delete svc hpa-demo-XX -n student-XX
kubectl delete deployment hpa-demo-XX -n student-XX

Common Problems

Problem Cause Solution
HPA shows <unknown> for targets Metrics Server not ready or Pod just started Wait 60 seconds, run kubectl top pods to verify metrics are available
HPA does not scale up CPU target not exceeded Increase load or lower the --cpu-percent target
kubectl top returns error Metrics Server not installed In AKS, Metrics Server is installed by default. Check with kubectl get pods -n kube-system -l k8s-app=metrics-server
Scale-down takes too long Default stabilization window is 5 minutes This is expected behavior to prevent flapping
Pods have no resource requests HPA requires resource requests to calculate utilization Always set resources.requests in your Pod spec

Best Practices

  • Always set resource requests — HPA calculates utilization as a percentage of the requested resources. Without requests, HPA cannot function
  • Choose appropriate targets — 50-70% CPU target is a common starting point; too low wastes resources, too high risks latency
  • Set reasonable min/max — The minimum should handle baseline traffic; the maximum should stay within cluster capacity
  • Combine with Pod Disruption Budgets — Ensure scale-down does not remove all Pods at once
  • Use custom metrics for non-CPU workloads — HPA supports custom metrics (e.g., request rate, queue depth) via the Custom Metrics API
  • Consider VPA for right-sizing — Use VPA to find optimal resource requests, then use HPA for scaling

Summary

In this exercise you learned:

  • The HPA automatically adjusts replica count based on observed metrics (CPU, memory, custom)
  • HPA requires the Metrics Server and resource requests to be defined on containers
  • The scaling algorithm compares current metrics against the target and adjusts replicas accordingly
  • Scale-up is immediate (no delay); scale-down uses a 5-minute stabilization window to prevent oscillation
  • VPA complements HPA by right-sizing individual Pod resource requests and limits

results matching ""

    No results matching ""