26: Cluster Scaling

26: Cluster Scaling

Objective

Understand how the Cluster Autoscaler (CA) automatically adds and removes nodes in AKS based on Pod scheduling demands. Learn about Node Auto Provisioning (NAP/Karpenter) and KEDA for event-driven autoscaling.


Theory

Cluster Autoscaler

The Cluster Autoscaler (CA) automatically adjusts the number of nodes in a node pool when:

  • Scale up: Pods cannot be scheduled because there are not enough resources on existing nodes (Pods remain in Pending state)
  • Scale down: Nodes are underutilized for an extended period (default: 10 minutes) and their Pods can be rescheduled elsewhere
Concept Description
Trigger for scale-up A Pod is unschedulable due to insufficient CPU, memory, or other resources
Trigger for scale-down Node utilization falls below 50% for 10 minutes and all Pods can be moved
Node pool scope CA is configured per node pool in AKS
Min/Max count Each node pool has a minimum and maximum number of nodes
Scale-down delay ~10 minutes of underutilization before a node is removed

AKS Configuration (Instructor Reference)

Note: The following az aks commands are cluster-level operations performed by the instructor. They are included here for educational purposes so you understand how the Cluster Autoscaler is configured in production.

The Cluster Autoscaler is enabled per node pool when creating or updating an AKS cluster:

# Enable during cluster creation
az aks create \
  --resource-group <RG> \
  --name <CLUSTER> \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 5

# Enable on an existing node pool
az aks nodepool update \
  --resource-group <RG> \
  --cluster-name <CLUSTER> \
  --name nodepool1 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 5

NAP (Node Auto Provisioning) / Karpenter

Node Auto Provisioning (NAP) is the next-generation node scaling solution for AKS, built on the open-source Karpenter project.

Key differences from the classic Cluster Autoscaler:

Feature Cluster Autoscaler NAP / Karpenter
Node sizing Fixed VM size per node pool Creates right-sized nodes on demand
Provisioning speed Slower (uses VMSS) Faster (direct VM provisioning)
Configuration Per node pool min/max NodePool CRD with flexible constraints
Bin packing Limited Advanced — better resource utilization

NAP is enabled by the cluster administrator:

# Instructor/admin command — do not run
az aks update --resource-group <RG> --name <CLUSTER> --enable-node-autoprovision

KEDA (Kubernetes Event-Driven Autoscaling)

KEDA extends Kubernetes autoscaling beyond CPU and memory. It allows scaling based on external event sources:

  • Azure Service Bus queue length
  • Azure Storage Queue messages
  • Prometheus metrics
  • Cron schedules
  • HTTP request rate
  • And 60+ other scalers

KEDA works alongside HPA — it creates and manages HPA objects based on ScaledObject definitions.

AKS KEDA add-on — managed by Azure, installed by the cluster administrator:

# Instructor/admin command — do not run
az aks update --resource-group <RG> --name <CLUSTER> --enable-keda

Mermaid Diagram: Pod Scaling + Node Scaling

flowchart TB
    subgraph "Pod Level Scaling"
        A[HPA] -->|Increases replicas<br>based on metrics| B[Deployment]
        B --> C[Pod 1]
        B --> D[Pod 2]
        B --> E[Pod 3 - Pending]
    end

    subgraph "Node Level Scaling"
        E -->|Cannot be scheduled<br>insufficient resources| F[Cluster Autoscaler]
        F -->|Requests new node<br>from Azure| G[Azure VMSS]
        G -->|Provisions new VM<br>joins cluster| H[New Node]
        H -->|Pod scheduled<br>on new node| E
    end

    subgraph "Scale Down"
        I[Nodes underutilized<br>for 10 min] --> J[Cluster Autoscaler]
        J -->|Drains and removes<br>underutilized node| K[Node removed<br>from cluster]
    end

Practical Tasks

All tasks should be performed in your namespace student-XX. Replace XX with your student number throughout.

Task 1: Examine Current Autoscaler Configuration

Check the current state of the cluster nodes:

kubectl get nodes

Examine the Cluster Autoscaler status:

kubectl describe configmap -n kube-system cluster-autoscaler-status

This ConfigMap shows:

  • Current cluster state (health, scale-up/scale-down status)
  • Node groups with min/max/current counts
  • Recent scaling events

You can also check the autoscaler logs:

kubectl logs -n kube-system -l app=cluster-autoscaler --tail=20

Task 2: Trigger a Scale-Up

Deploy a workload with high resource requests that will exceed the capacity of existing nodes. This will cause Pods to remain in Pending state, triggering the Cluster Autoscaler.

Create a file scale-test.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: scale-test-XX
  namespace: student-XX
spec:
  replicas: 10
  selector:
    matchLabels:
      app: scale-test-XX
  template:
    metadata:
      labels:
        app: scale-test-XX
    spec:
      containers:
      - name: busybox
        image: busybox
        command: ["sleep", "3600"]
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"

Apply it:

kubectl apply -f scale-test.yaml -n student-XX

Check the status of Pods — some should be in Pending state:

kubectl get pods -l app=scale-test-XX -n student-XX

Inspect a pending Pod to confirm the reason:

kubectl describe pod <pending-pod-name> -n student-XX

Look for events like: FailedScheduling ... Insufficient cpu or Insufficient memory.


Task 3: Monitor Node Scaling

Watch nodes in real time to see the Cluster Autoscaler add a new node:

kubectl get nodes -w

In another terminal, watch the Pods transition from Pending to Running:

kubectl get pods -l app=scale-test-XX -w -n student-XX

Check the Cluster Autoscaler status again:

kubectl describe configmap -n kube-system cluster-autoscaler-status

Note: Node provisioning typically takes 2-5 minutes in AKS (VM creation + joining the cluster).


Task 4: Observe Scale-Down

Delete the Deployment to free up resources:

kubectl delete deployment scale-test-XX -n student-XX

After deleting, the new node will become underutilized. The Cluster Autoscaler evaluates underutilized nodes after approximately 10 minutes and will remove the node if:

  • Node utilization is below 50%
  • All Pods on the node can be rescheduled to other nodes
  • There are no Pods with restrictive scheduling constraints (e.g., local storage, PodDisruptionBudget)

Watch for the node to be removed:

kubectl get nodes -w

Note: Scale-down can take 10-15 minutes. The instructor may demonstrate this or move on to the next section.


KEDA: Event-Driven Autoscaling (Instructor Demo)

Note: KEDA installation is a cluster-level operation performed by the instructor. The following is a conceptual overview. Participants can review the ScaledObject example to understand the pattern.

KEDA uses a ScaledObject custom resource to define scaling rules based on external event sources. Here is a conceptual example that scales based on an Azure Service Bus queue:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
spec:
  scaleTargetRef:
    name: order-processor     # Deployment to scale
  minReplicaCount: 0          # Scale to zero when idle
  maxReplicaCount: 20
  triggers:
  - type: azure-servicebus
    metadata:
      queueName: orders
      messageCount: "5"       # Scale up when > 5 messages
      connectionFromEnv: SB_CONNECTION

Key KEDA features:

  • Scale to zero — unlike HPA, KEDA can scale a Deployment down to 0 replicas
  • 60+ scalers — supports Azure services, AWS, GCP, databases, message queues, and more
  • Works with HPA — KEDA creates and manages HPA resources automatically

Common Problems

Problem Cause Solution
Nodes not scaling up Cluster Autoscaler not enabled on node pool Inform the instructor — verify with az aks nodepool show — check enableAutoScaling
Scale-up is slow VM provisioning takes time in Azure Expected: 2-5 minutes. Consider using NAP/Karpenter for faster provisioning
Node not scaling down Pods with PodDisruptionBudget or local storage Check kubectl describe configmap -n kube-system cluster-autoscaler-status for blocked scale-down reasons
Pods stuck in Pending Resource requests exceed max node pool size Increase --max-count or reduce resource requests
Scale-down takes too long 10-minute evaluation window is default This is by design to prevent flapping

Best Practices

  • Set appropriate min/max counts — minimum should handle baseline load; maximum should consider cost constraints
  • Use multiple node pools — separate system and user workloads; configure autoscaling per pool
  • Define resource requests accurately — the Cluster Autoscaler relies on resource requests (not actual usage) to make decisions
  • Use Pod Disruption Budgets — protect critical workloads during node scale-down
  • Consider NAP/Karpenter — for workloads with diverse resource requirements, NAP creates right-sized nodes more efficiently
  • Use KEDA for event-driven workloads — queue processors, batch jobs, and services with variable traffic patterns benefit from KEDA’s scale-to-zero capability
  • Monitor scaling events — review Cluster Autoscaler logs and status ConfigMap regularly

Summary

In this exercise you learned:

  • The Cluster Autoscaler adds nodes when Pods cannot be scheduled and removes nodes when they are underutilized
  • AKS configures the Cluster Autoscaler per node pool with --enable-cluster-autoscaler --min-count --max-count
  • NAP (Node Auto Provisioning) / Karpenter is the next-generation solution that creates right-sized nodes on demand
  • KEDA extends autoscaling to external event sources (queues, metrics, schedules) and supports scale-to-zero
  • Pod-level scaling (HPA) and node-level scaling (CA) work together: HPA adds replicas, CA adds nodes to run them

results matching ""

    No results matching ""