25: Horizontal Pod Autoscaler

Objective

Learn how the Horizontal Pod Autoscaler (HPA) automatically scales the number of Pod replicas based on observed CPU utilization or other metrics. Deploy a CPU-intensive application, create an HPA, generate load, and observe automatic scaling behavior.

Theory

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics such as CPU or memory utilization. It is one of the core autoscaling mechanisms in Kubernetes.

Key concepts:

Concept	Description
Target metric	The metric HPA monitors (e.g., CPU utilization at 50%)
Min/Max replicas	Boundaries for scaling — HPA will never scale below min or above max
Scaling algorithm	HPA calculates desired replicas: `desired = ceil(current * (currentMetric / targetMetric))`
Stabilization window	Scale-up: no delay (immediate). Scale-down: 5-minute stabilization window. Prevents flapping
Metrics Server	Required component that collects resource metrics from kubelets. Installed by default in AKS

How HPA Works

The HPA controller queries the Metrics Server every 15 seconds (default)
It compares the current metric value against the target
If the current value exceeds the target, it scales up (adds replicas)
If the current value is below the target, it scales down (removes replicas)
A 5-minute stabilization window prevents rapid scale-down oscillation (scale-up has no delay)

VPA Overview (Vertical Pod Autoscaler)

While HPA scales horizontally (more Pods), the Vertical Pod Autoscaler (VPA) scales vertically — it adjusts the CPU and memory requests and limits of individual containers.

VPA monitors actual resource usage and recommends (or automatically applies) right-sized requests
VPA and HPA are complementary — use HPA for scaling replicas and VPA for right-sizing individual Pods
VPA should not be used with HPA on the same CPU/memory metric simultaneously

Metrics Server

The Metrics Server is a cluster-wide aggregator of resource usage data. It collects CPU and memory metrics from kubelets and exposes them through the Kubernetes Metrics API.

AKS: Metrics Server is installed by default — no additional setup required
Verify with: kubectl top nodes and kubectl top pods

Mermaid Diagram: HPA Monitoring Loop

flowchart LR
    A[Metrics Server] -->|Collects CPU/memory<br>from kubelets| B[HPA Controller]
    B -->|Every 15s:<br>query current metrics| A
    B -->|Compare current<br>vs target| C{Current > Target?}
    C -->|Yes| D[Scale Up<br>Add replicas]
    C -->|No| E{Current < Target?}
    E -->|Yes| F[Scale Down<br>Remove replicas<br>after 5 min stabilization]
    E -->|No| G[No change]
    D --> H[Deployment<br>adjusts replica count]
    F --> H
    G --> B
    H --> B

Practical Tasks

All tasks should be performed in your namespace student-XX. Replace XX with your student number throughout.

Task 1: Deploy a CPU-Intensive Application

Deploy the HPA example application, which is a simple PHP-based web server that performs CPU-intensive computations on each request.

Create a file hpa-demo.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-demo-XX
  namespace: student-XX
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hpa-demo-XX
  template:
    metadata:
      labels:
        app: hpa-demo-XX
    spec:
      containers:
      - name: app
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "200m"
          limits:
            cpu: "500m"

Apply it and expose with a Service:

kubectl apply -f hpa-demo.yaml -n student-XX
kubectl expose deployment hpa-demo-XX --port=80 -n student-XX

Verify the Pod is running and check initial CPU usage:

kubectl get pods -l app=hpa-demo-XX -n student-XX
kubectl top pods -l app=hpa-demo-XX -n student-XX

Note: kubectl top may take a minute to start showing metrics for newly created Pods.

Task 2: Create an HPA

Create an HPA that targets 50% average CPU utilization, with a minimum of 1 and maximum of 5 replicas:

kubectl autoscale deployment hpa-demo-XX --cpu-percent=50 --min=1 --max=5 -n student-XX

Verify the HPA was created:

kubectl get hpa hpa-demo-XX -n student-XX

You should see output similar to:

NAME          REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
hpa-demo-XX   Deployment/hpa-demo-XX   <unknown>/50%   1         5         1          10s

The <unknown> target is normal initially — it takes about 60 seconds for the Metrics Server to start reporting values.

Task 3: Generate Load

Open a second terminal and run a load generator that continuously sends HTTP requests to the application:

kubectl run -i --tty load-generator-XX --rm --image=busybox -n student-XX -- /bin/sh -c "while true; do wget -q -O- http://hpa-demo-XX; done"

This will create a temporary Pod that hammers the service with requests, driving up CPU usage.

Keep this terminal open and the load generator running for the next task.

Task 4: Observe Scale-Up

In your first terminal, watch the HPA and Pods in real time:

kubectl get hpa hpa-demo-XX -w -n student-XX

In a third terminal (or another tab), watch the Pods:

kubectl get pods -l app=hpa-demo-XX -w -n student-XX

Within 1-2 minutes, you should observe:

CPU utilization rising above 50%
HPA increasing the replica count
New Pods being created

The HPA will scale up until CPU utilization per Pod drops below the 50% target.

Task 5: Stop Load and Observe Scale-Down

Go back to the second terminal and press Ctrl+C to stop the load generator
Continue watching the HPA and Pods in the other terminals
After about 5 minutes (the default scale-down cooldown), HPA will begin reducing the replica count
Eventually, replicas will return to 1

kubectl get hpa hpa-demo-XX -n student-XX
kubectl get pods -l app=hpa-demo-XX -n student-XX

Cleanup

kubectl delete hpa hpa-demo-XX -n student-XX
kubectl delete svc hpa-demo-XX -n student-XX
kubectl delete deployment hpa-demo-XX -n student-XX

Common Problems

Problem	Cause	Solution
HPA shows `<unknown>` for targets	Metrics Server not ready or Pod just started	Wait 60 seconds, run `kubectl top pods` to verify metrics are available
HPA does not scale up	CPU target not exceeded	Increase load or lower the `--cpu-percent` target
`kubectl top` returns error	Metrics Server not installed	In AKS, Metrics Server is installed by default. Check with `kubectl get pods -n kube-system -l k8s-app=metrics-server`
Scale-down takes too long	Default stabilization window is 5 minutes	This is expected behavior to prevent flapping
Pods have no resource requests	HPA requires resource requests to calculate utilization	Always set `resources.requests` in your Pod spec

Best Practices

Always set resource requests — HPA calculates utilization as a percentage of the requested resources. Without requests, HPA cannot function
Choose appropriate targets — 50-70% CPU target is a common starting point; too low wastes resources, too high risks latency
Set reasonable min/max — The minimum should handle baseline traffic; the maximum should stay within cluster capacity
Combine with Pod Disruption Budgets — Ensure scale-down does not remove all Pods at once
Use custom metrics for non-CPU workloads — HPA supports custom metrics (e.g., request rate, queue depth) via the Custom Metrics API
Consider VPA for right-sizing — Use VPA to find optimal resource requests, then use HPA for scaling

Summary

In this exercise you learned:

The HPA automatically adjusts replica count based on observed metrics (CPU, memory, custom)
HPA requires the Metrics Server and resource requests to be defined on containers
The scaling algorithm compares current metrics against the target and adjusts replicas accordingly
Scale-up is immediate (no delay); scale-down uses a 5-minute stabilization window to prevent oscillation
VPA complements HPA by right-sizing individual Pod resource requests and limits

25: Horizontal Pod Autoscaler

25: Horizontal Pod Autoscaler

25: Horizontal Pod Autoscaler

Objective

Theory

What is the Horizontal Pod Autoscaler?

How HPA Works

VPA Overview (Vertical Pod Autoscaler)

Metrics Server

Mermaid Diagram: HPA Monitoring Loop

Practical Tasks

Task 1: Deploy a CPU-Intensive Application

Task 2: Create an HPA

Task 3: Generate Load

Task 4: Observe Scale-Up

Task 5: Stop Load and Observe Scale-Down

Cleanup

Common Problems

Best Practices

Summary

results matching ""

No results matching ""