26: Cluster Scaling
26: Cluster Scaling
Objective
Understand how the Cluster Autoscaler (CA) automatically adds and removes nodes in AKS based on Pod scheduling demands. Learn about Node Auto Provisioning (NAP/Karpenter) and KEDA for event-driven autoscaling.
Theory
Cluster Autoscaler
The Cluster Autoscaler (CA) automatically adjusts the number of nodes in a node pool when:
- Scale up: Pods cannot be scheduled because there are not enough resources on existing nodes (Pods remain in
Pendingstate) - Scale down: Nodes are underutilized for an extended period (default: 10 minutes) and their Pods can be rescheduled elsewhere
| Concept | Description |
|---|---|
| Trigger for scale-up | A Pod is unschedulable due to insufficient CPU, memory, or other resources |
| Trigger for scale-down | Node utilization falls below 50% for 10 minutes and all Pods can be moved |
| Node pool scope | CA is configured per node pool in AKS |
| Min/Max count | Each node pool has a minimum and maximum number of nodes |
| Scale-down delay | ~10 minutes of underutilization before a node is removed |
AKS Configuration (Instructor Reference)
Note: The following
az akscommands are cluster-level operations performed by the instructor. They are included here for educational purposes so you understand how the Cluster Autoscaler is configured in production.
The Cluster Autoscaler is enabled per node pool when creating or updating an AKS cluster:
# Enable during cluster creation
az aks create \
--resource-group <RG> \
--name <CLUSTER> \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
# Enable on an existing node pool
az aks nodepool update \
--resource-group <RG> \
--cluster-name <CLUSTER> \
--name nodepool1 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
NAP (Node Auto Provisioning) / Karpenter
Node Auto Provisioning (NAP) is the next-generation node scaling solution for AKS, built on the open-source Karpenter project.
Key differences from the classic Cluster Autoscaler:
| Feature | Cluster Autoscaler | NAP / Karpenter |
|---|---|---|
| Node sizing | Fixed VM size per node pool | Creates right-sized nodes on demand |
| Provisioning speed | Slower (uses VMSS) | Faster (direct VM provisioning) |
| Configuration | Per node pool min/max | NodePool CRD with flexible constraints |
| Bin packing | Limited | Advanced — better resource utilization |
NAP is enabled by the cluster administrator:
# Instructor/admin command — do not run
az aks update --resource-group <RG> --name <CLUSTER> --enable-node-autoprovision
KEDA (Kubernetes Event-Driven Autoscaling)
KEDA extends Kubernetes autoscaling beyond CPU and memory. It allows scaling based on external event sources:
- Azure Service Bus queue length
- Azure Storage Queue messages
- Prometheus metrics
- Cron schedules
- HTTP request rate
- And 60+ other scalers
KEDA works alongside HPA — it creates and manages HPA objects based on ScaledObject definitions.
AKS KEDA add-on — managed by Azure, installed by the cluster administrator:
# Instructor/admin command — do not run
az aks update --resource-group <RG> --name <CLUSTER> --enable-keda
Mermaid Diagram: Pod Scaling + Node Scaling
flowchart TB
subgraph "Pod Level Scaling"
A[HPA] -->|Increases replicas<br>based on metrics| B[Deployment]
B --> C[Pod 1]
B --> D[Pod 2]
B --> E[Pod 3 - Pending]
end
subgraph "Node Level Scaling"
E -->|Cannot be scheduled<br>insufficient resources| F[Cluster Autoscaler]
F -->|Requests new node<br>from Azure| G[Azure VMSS]
G -->|Provisions new VM<br>joins cluster| H[New Node]
H -->|Pod scheduled<br>on new node| E
end
subgraph "Scale Down"
I[Nodes underutilized<br>for 10 min] --> J[Cluster Autoscaler]
J -->|Drains and removes<br>underutilized node| K[Node removed<br>from cluster]
end
Practical Tasks
All tasks should be performed in your namespace
student-XX. ReplaceXXwith your student number throughout.
Task 1: Examine Current Autoscaler Configuration
Check the current state of the cluster nodes:
kubectl get nodes
Examine the Cluster Autoscaler status:
kubectl describe configmap -n kube-system cluster-autoscaler-status
This ConfigMap shows:
- Current cluster state (health, scale-up/scale-down status)
- Node groups with min/max/current counts
- Recent scaling events
You can also check the autoscaler logs:
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=20
Task 2: Trigger a Scale-Up
Deploy a workload with high resource requests that will exceed the capacity of existing nodes. This will cause Pods to remain in Pending state, triggering the Cluster Autoscaler.
Create a file scale-test.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: scale-test-XX
namespace: student-XX
spec:
replicas: 10
selector:
matchLabels:
app: scale-test-XX
template:
metadata:
labels:
app: scale-test-XX
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
resources:
requests:
cpu: "250m"
memory: "256Mi"
Apply it:
kubectl apply -f scale-test.yaml -n student-XX
Check the status of Pods — some should be in Pending state:
kubectl get pods -l app=scale-test-XX -n student-XX
Inspect a pending Pod to confirm the reason:
kubectl describe pod <pending-pod-name> -n student-XX
Look for events like: FailedScheduling ... Insufficient cpu or Insufficient memory.
Task 3: Monitor Node Scaling
Watch nodes in real time to see the Cluster Autoscaler add a new node:
kubectl get nodes -w
In another terminal, watch the Pods transition from Pending to Running:
kubectl get pods -l app=scale-test-XX -w -n student-XX
Check the Cluster Autoscaler status again:
kubectl describe configmap -n kube-system cluster-autoscaler-status
Note: Node provisioning typically takes 2-5 minutes in AKS (VM creation + joining the cluster).
Task 4: Observe Scale-Down
Delete the Deployment to free up resources:
kubectl delete deployment scale-test-XX -n student-XX
After deleting, the new node will become underutilized. The Cluster Autoscaler evaluates underutilized nodes after approximately 10 minutes and will remove the node if:
- Node utilization is below 50%
- All Pods on the node can be rescheduled to other nodes
- There are no Pods with restrictive scheduling constraints (e.g., local storage, PodDisruptionBudget)
Watch for the node to be removed:
kubectl get nodes -w
Note: Scale-down can take 10-15 minutes. The instructor may demonstrate this or move on to the next section.
KEDA: Event-Driven Autoscaling (Instructor Demo)
Note: KEDA installation is a cluster-level operation performed by the instructor. The following is a conceptual overview. Participants can review the ScaledObject example to understand the pattern.
KEDA uses a ScaledObject custom resource to define scaling rules based on external event sources. Here is a conceptual example that scales based on an Azure Service Bus queue:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
spec:
scaleTargetRef:
name: order-processor # Deployment to scale
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 20
triggers:
- type: azure-servicebus
metadata:
queueName: orders
messageCount: "5" # Scale up when > 5 messages
connectionFromEnv: SB_CONNECTION
Key KEDA features:
- Scale to zero — unlike HPA, KEDA can scale a Deployment down to 0 replicas
- 60+ scalers — supports Azure services, AWS, GCP, databases, message queues, and more
- Works with HPA — KEDA creates and manages HPA resources automatically
Common Problems
| Problem | Cause | Solution |
|---|---|---|
| Nodes not scaling up | Cluster Autoscaler not enabled on node pool | Inform the instructor — verify with az aks nodepool show — check enableAutoScaling |
| Scale-up is slow | VM provisioning takes time in Azure | Expected: 2-5 minutes. Consider using NAP/Karpenter for faster provisioning |
| Node not scaling down | Pods with PodDisruptionBudget or local storage | Check kubectl describe configmap -n kube-system cluster-autoscaler-status for blocked scale-down reasons |
| Pods stuck in Pending | Resource requests exceed max node pool size | Increase --max-count or reduce resource requests |
| Scale-down takes too long | 10-minute evaluation window is default | This is by design to prevent flapping |
Best Practices
- Set appropriate min/max counts — minimum should handle baseline load; maximum should consider cost constraints
- Use multiple node pools — separate system and user workloads; configure autoscaling per pool
- Define resource requests accurately — the Cluster Autoscaler relies on resource requests (not actual usage) to make decisions
- Use Pod Disruption Budgets — protect critical workloads during node scale-down
- Consider NAP/Karpenter — for workloads with diverse resource requirements, NAP creates right-sized nodes more efficiently
- Use KEDA for event-driven workloads — queue processors, batch jobs, and services with variable traffic patterns benefit from KEDA’s scale-to-zero capability
- Monitor scaling events — review Cluster Autoscaler logs and status ConfigMap regularly
Summary
In this exercise you learned:
- The Cluster Autoscaler adds nodes when Pods cannot be scheduled and removes nodes when they are underutilized
- AKS configures the Cluster Autoscaler per node pool with
--enable-cluster-autoscaler --min-count --max-count - NAP (Node Auto Provisioning) / Karpenter is the next-generation solution that creates right-sized nodes on demand
- KEDA extends autoscaling to external event sources (queues, metrics, schedules) and supports scale-to-zero
- Pod-level scaling (HPA) and node-level scaling (CA) work together: HPA adds replicas, CA adds nodes to run them