21: Pod Disruption Budgets
21: Pod Disruption Budgets
Objective
Understand the difference between voluntary and involuntary disruptions, learn how Pod Disruption Budgets (PDBs) protect application availability during planned operations, and practice creating and verifying PDBs.
Theory
Voluntary vs Involuntary Disruptions
Kubernetes distinguishes between two types of disruptions that can terminate Pods:
| Type | Examples | Can Be Prevented? |
|---|---|---|
| Voluntary | Node drain, cluster upgrade, kubectl delete pod, scaling down |
Yes — PDBs protect against these |
| Involuntary | Node crash, OOM kill, hardware failure, kernel panic | No — PDBs cannot prevent these |
A Pod Disruption Budget (PDB) only protects against voluntary disruptions. When a voluntary operation (such as draining a node) would violate the PDB, the operation is paused or slowed down until it can proceed without breaking the budget.
PDB Configuration
A PDB defines how many Pods from a set must remain available during voluntary disruptions:
| Field | Description | Example |
|---|---|---|
minAvailable |
Minimum number of Pods that must remain running | 2 or "50%" |
maxUnavailable |
Maximum number of Pods that can be unavailable | 1 or "25%" |
You specify one of these fields, not both. They are complementary:
minAvailable: 2with 3 replicas means at most 1 can be disrupted at a timemaxUnavailable: 1with 3 replicas has the same effect
AKS Context: Node Upgrades
PDBs are especially important in AKS because of node surge upgrades:
- During an AKS cluster upgrade, AKS creates new nodes with the updated version and drains old nodes
- Surge upgrade settings control how many extra nodes are created simultaneously (recommended: 33%
maxSurgefor production) - Without a PDB, the drain process may evict all replicas of an application at once, causing downtime
- With a PDB, the drain process respects the budget and evicts Pods gradually, maintaining availability
- Quarantine behavior: If a PDB blocks node drain, AKS may quarantine the blocked node (cordon it and move on). The quarantined node stays on the old version and requires manual intervention
- AKS also offers a force upgrade option that bypasses PDB constraints — use with caution as it can cause service disruption
Warning: Do not configure PDBs so aggressively that they block node upgrades entirely. For example, setting
minAvailableequal to the replica count will prevent any eviction, potentially stalling cluster upgrades.
Node Drain with PDB Protection
graph TB
subgraph before["Before Drain"]
direction TB
N1_B["Node 1"]
N2_B["Node 2"]
P1_B["Pod A-1<br/>Running"]
P2_B["Pod A-2<br/>Running"]
P3_B["Pod A-3<br/>Running"]
N1_B --> P1_B
N1_B --> P2_B
N2_B --> P3_B
PDB_B["PDB: minAvailable=2<br/>Available: 3<br/>Allowed Disruptions: 1"]
end
subgraph during["During Drain of Node 1"]
direction TB
N1_D["Node 1<br/>(draining)"]
N2_D["Node 2"]
P1_D["Pod A-1<br/>Evicted"]
P2_D["Pod A-2<br/>Still Running<br/>(PDB blocks eviction)"]
P3_D["Pod A-3<br/>Running"]
P4_D["Pod A-1<br/>Rescheduled"]
N1_D --> P1_D
N1_D --> P2_D
N2_D --> P3_D
N2_D --> P4_D
PDB_D["PDB: minAvailable=2<br/>Available: 2<br/>Allowed Disruptions: 0<br/>Drain waits..."]
end
subgraph after["After Pod A-1 Is Ready"]
direction TB
N1_A["Node 1<br/>(draining)"]
N2_A["Node 2"]
P2_A["Pod A-2<br/>Now evicted"]
P3_A["Pod A-3<br/>Running"]
P4_A["Pod A-1<br/>Running"]
N1_A --> P2_A
N2_A --> P3_A
N2_A --> P4_A
PDB_A["PDB: minAvailable=2<br/>Available: 2→3<br/>Drain continues"]
end
before -->|"kubectl drain node1"| during
during -->|"Pod A-1 ready on Node 2"| after
style before fill:#c8e6c9,stroke:#2e7d32,stroke-width:1px
style during fill:#fff9c4,stroke:#f9a825,stroke-width:1px
style after fill:#e1f5fe,stroke:#0288d1,stroke-width:1px
Practical Tasks
Task 1: Create a Deployment with 3 Replicas
Create a file called nginx-pdb-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-pdb-XX
namespace: student-XX
labels:
app: nginx-XX
spec:
replicas: 3
selector:
matchLabels:
app: nginx-XX
template:
metadata:
labels:
app: nginx-XX
spec:
containers:
- name: nginx
image: nginx:1.27
ports:
- containerPort: 80
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "128Mi"
Deploy and verify:
kubectl apply -f nginx-pdb-deployment.yaml
kubectl get pods -n student-XX -l app=nginx-XX
Ensure all 3 replicas are Running before proceeding.
Task 2: Create a Pod Disruption Budget
Create a file called nginx-pdb.yaml:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: nginx-pdb-XX
namespace: student-XX
spec:
minAvailable: 2
selector:
matchLabels:
app: nginx-XX
This PDB requires that at least 2 Pods with label app: nginx-XX must remain available at all times during voluntary disruptions.
Apply the PDB:
kubectl apply -f nginx-pdb.yaml
Task 3: Verify PDB Status
Check the PDB status:
kubectl get pdb -n student-XX
Expected output:
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
nginx-pdb-XX 2 N/A 1 10s
Key fields:
- MIN AVAILABLE: 2 (our configured minimum)
- ALLOWED DISRUPTIONS: 1 (with 3 replicas and minimum 2, only 1 can be disrupted)
Get detailed information:
kubectl describe pdb nginx-pdb-XX -n student-XX
This shows the current count of healthy Pods, the desired availability, and how many disruptions are currently allowed.
Task 4: Observe PDB Constraints
Important — Shared Cluster: Draining a node is a cluster-level operation that affects all participants. Only the instructor can perform
kubectl drain. The approach below is safe for all participants to run in their own namespace.
Participant approach (safe for shared clusters):
Delete one Pod and observe the PDB:
# Check current state
kubectl get pdb nginx-pdb-XX -n student-XX
# Delete one Pod (voluntary disruption)
kubectl delete pod -n student-XX -l app=nginx-XX --field-selector=status.phase=Running --grace-period=0 --force 2>/dev/null | head -1
Quickly check the PDB status:
kubectl get pdb nginx-pdb-XX -n student-XX
While the Deployment is replacing the deleted Pod, you may briefly see ALLOWED DISRUPTIONS: 0, meaning the PDB would block any further voluntary disruptions until the replacement Pod is ready.
Watch the Pods recover:
kubectl get pods -n student-XX -l app=nginx-XX -w
Once the replacement Pod is Running and Ready, check the PDB again:
kubectl get pdb nginx-pdb-XX -n student-XX
ALLOWED DISRUPTIONS should return to 1.
Clean Up
kubectl delete -f nginx-pdb.yaml
kubectl delete -f nginx-pdb-deployment.yaml
Common Problems
| Problem | Possible Cause | Solution |
|---|---|---|
PDB shows ALLOWED DISRUPTIONS: 0 permanently |
Not enough healthy replicas to satisfy minAvailable |
Scale up the Deployment or reduce minAvailable |
| Node drain hangs indefinitely | PDB is preventing all evictions | Check if enough replicas exist on other nodes; increase replicas or adjust PDB |
| PDB does not match any Pods | Selector labels do not match Pod labels | Verify labels with kubectl get pods --show-labels |
minAvailable set to number of replicas |
PDB blocks all disruptions — no Pod can ever be evicted | Set minAvailable to at least 1 less than the replica count |
| AKS upgrade gets stuck | PDBs on system Pods or misconfigured PDBs blocking drain | Review PDBs across all namespaces: kubectl get pdb --all-namespaces |
Best Practices
- Always create PDBs for critical workloads — Any Deployment with multiple replicas that requires high availability should have a PDB.
- Do not set
minAvailableequal to replicas — This would prevent any voluntary disruption, including necessary cluster upgrades. Always allow at least 1 disruption. - Use
maxUnavailablefor simpler reasoning —maxUnavailable: 1clearly states “only 1 Pod can be down at a time” regardless of the total replica count. - Coordinate PDBs with replica count — Ensure you have enough replicas spread across multiple nodes so that PDB constraints can be satisfied during node drains.
- Test PDBs before AKS upgrades — Verify that PDBs are correctly configured and allow upgrades to proceed. Misconfigured PDBs are a common cause of stuck upgrades.
Summary
In this exercise you learned:
- The difference between voluntary disruptions (drain, upgrade, delete) and involuntary disruptions (crash, OOM, hardware failure)
- PDBs only protect against voluntary disruptions
minAvailablesets the minimum number of Pods that must remain running;maxUnavailablesets the maximum number that can be down- PDBs are critical during AKS node upgrades to maintain application availability
- How to create, verify, and monitor PDB status using
kubectl get pdb - The importance of not setting
minAvailableequal to the replica count