21: Pod Disruption Budgets

21: Pod Disruption Budgets

Objective

Understand the difference between voluntary and involuntary disruptions, learn how Pod Disruption Budgets (PDBs) protect application availability during planned operations, and practice creating and verifying PDBs.


Theory

Voluntary vs Involuntary Disruptions

Kubernetes distinguishes between two types of disruptions that can terminate Pods:

Type Examples Can Be Prevented?
Voluntary Node drain, cluster upgrade, kubectl delete pod, scaling down Yes — PDBs protect against these
Involuntary Node crash, OOM kill, hardware failure, kernel panic No — PDBs cannot prevent these

A Pod Disruption Budget (PDB) only protects against voluntary disruptions. When a voluntary operation (such as draining a node) would violate the PDB, the operation is paused or slowed down until it can proceed without breaking the budget.

PDB Configuration

A PDB defines how many Pods from a set must remain available during voluntary disruptions:

Field Description Example
minAvailable Minimum number of Pods that must remain running 2 or "50%"
maxUnavailable Maximum number of Pods that can be unavailable 1 or "25%"

You specify one of these fields, not both. They are complementary:

  • minAvailable: 2 with 3 replicas means at most 1 can be disrupted at a time
  • maxUnavailable: 1 with 3 replicas has the same effect

AKS Context: Node Upgrades

PDBs are especially important in AKS because of node surge upgrades:

  • During an AKS cluster upgrade, AKS creates new nodes with the updated version and drains old nodes
  • Surge upgrade settings control how many extra nodes are created simultaneously (recommended: 33% maxSurge for production)
  • Without a PDB, the drain process may evict all replicas of an application at once, causing downtime
  • With a PDB, the drain process respects the budget and evicts Pods gradually, maintaining availability
  • Quarantine behavior: If a PDB blocks node drain, AKS may quarantine the blocked node (cordon it and move on). The quarantined node stays on the old version and requires manual intervention
  • AKS also offers a force upgrade option that bypasses PDB constraints — use with caution as it can cause service disruption

Warning: Do not configure PDBs so aggressively that they block node upgrades entirely. For example, setting minAvailable equal to the replica count will prevent any eviction, potentially stalling cluster upgrades.

Node Drain with PDB Protection

graph TB
    subgraph before["Before Drain"]
        direction TB
        N1_B["Node 1"]
        N2_B["Node 2"]
        P1_B["Pod A-1<br/>Running"]
        P2_B["Pod A-2<br/>Running"]
        P3_B["Pod A-3<br/>Running"]
        N1_B --> P1_B
        N1_B --> P2_B
        N2_B --> P3_B
        PDB_B["PDB: minAvailable=2<br/>Available: 3<br/>Allowed Disruptions: 1"]
    end

    subgraph during["During Drain of Node 1"]
        direction TB
        N1_D["Node 1<br/>(draining)"]
        N2_D["Node 2"]
        P1_D["Pod A-1<br/>Evicted"]
        P2_D["Pod A-2<br/>Still Running<br/>(PDB blocks eviction)"]
        P3_D["Pod A-3<br/>Running"]
        P4_D["Pod A-1<br/>Rescheduled"]
        N1_D --> P1_D
        N1_D --> P2_D
        N2_D --> P3_D
        N2_D --> P4_D
        PDB_D["PDB: minAvailable=2<br/>Available: 2<br/>Allowed Disruptions: 0<br/>Drain waits..."]
    end

    subgraph after["After Pod A-1 Is Ready"]
        direction TB
        N1_A["Node 1<br/>(draining)"]
        N2_A["Node 2"]
        P2_A["Pod A-2<br/>Now evicted"]
        P3_A["Pod A-3<br/>Running"]
        P4_A["Pod A-1<br/>Running"]
        N1_A --> P2_A
        N2_A --> P3_A
        N2_A --> P4_A
        PDB_A["PDB: minAvailable=2<br/>Available: 2→3<br/>Drain continues"]
    end

    before -->|"kubectl drain node1"| during
    during -->|"Pod A-1 ready on Node 2"| after

    style before fill:#c8e6c9,stroke:#2e7d32,stroke-width:1px
    style during fill:#fff9c4,stroke:#f9a825,stroke-width:1px
    style after fill:#e1f5fe,stroke:#0288d1,stroke-width:1px

Practical Tasks

Task 1: Create a Deployment with 3 Replicas

Create a file called nginx-pdb-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-pdb-XX
  namespace: student-XX
  labels:
    app: nginx-XX
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-XX
  template:
    metadata:
      labels:
        app: nginx-XX
    spec:
      containers:
        - name: nginx
          image: nginx:1.27
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "50m"
              memory: "64Mi"
            limits:
              cpu: "100m"
              memory: "128Mi"

Deploy and verify:

kubectl apply -f nginx-pdb-deployment.yaml
kubectl get pods -n student-XX -l app=nginx-XX

Ensure all 3 replicas are Running before proceeding.

Task 2: Create a Pod Disruption Budget

Create a file called nginx-pdb.yaml:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb-XX
  namespace: student-XX
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: nginx-XX

This PDB requires that at least 2 Pods with label app: nginx-XX must remain available at all times during voluntary disruptions.

Apply the PDB:

kubectl apply -f nginx-pdb.yaml

Task 3: Verify PDB Status

Check the PDB status:

kubectl get pdb -n student-XX

Expected output:

NAME           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
nginx-pdb-XX   2               N/A               1                     10s

Key fields:

  • MIN AVAILABLE: 2 (our configured minimum)
  • ALLOWED DISRUPTIONS: 1 (with 3 replicas and minimum 2, only 1 can be disrupted)

Get detailed information:

kubectl describe pdb nginx-pdb-XX -n student-XX

This shows the current count of healthy Pods, the desired availability, and how many disruptions are currently allowed.

Task 4: Observe PDB Constraints

Important — Shared Cluster: Draining a node is a cluster-level operation that affects all participants. Only the instructor can perform kubectl drain. The approach below is safe for all participants to run in their own namespace.

Participant approach (safe for shared clusters):

Delete one Pod and observe the PDB:

# Check current state
kubectl get pdb nginx-pdb-XX -n student-XX

# Delete one Pod (voluntary disruption)
kubectl delete pod -n student-XX -l app=nginx-XX --field-selector=status.phase=Running --grace-period=0 --force 2>/dev/null | head -1

Quickly check the PDB status:

kubectl get pdb nginx-pdb-XX -n student-XX

While the Deployment is replacing the deleted Pod, you may briefly see ALLOWED DISRUPTIONS: 0, meaning the PDB would block any further voluntary disruptions until the replacement Pod is ready.

Watch the Pods recover:

kubectl get pods -n student-XX -l app=nginx-XX -w

Once the replacement Pod is Running and Ready, check the PDB again:

kubectl get pdb nginx-pdb-XX -n student-XX

ALLOWED DISRUPTIONS should return to 1.

Clean Up

kubectl delete -f nginx-pdb.yaml
kubectl delete -f nginx-pdb-deployment.yaml

Common Problems

Problem Possible Cause Solution
PDB shows ALLOWED DISRUPTIONS: 0 permanently Not enough healthy replicas to satisfy minAvailable Scale up the Deployment or reduce minAvailable
Node drain hangs indefinitely PDB is preventing all evictions Check if enough replicas exist on other nodes; increase replicas or adjust PDB
PDB does not match any Pods Selector labels do not match Pod labels Verify labels with kubectl get pods --show-labels
minAvailable set to number of replicas PDB blocks all disruptions — no Pod can ever be evicted Set minAvailable to at least 1 less than the replica count
AKS upgrade gets stuck PDBs on system Pods or misconfigured PDBs blocking drain Review PDBs across all namespaces: kubectl get pdb --all-namespaces

Best Practices

  1. Always create PDBs for critical workloads — Any Deployment with multiple replicas that requires high availability should have a PDB.
  2. Do not set minAvailable equal to replicas — This would prevent any voluntary disruption, including necessary cluster upgrades. Always allow at least 1 disruption.
  3. Use maxUnavailable for simpler reasoningmaxUnavailable: 1 clearly states “only 1 Pod can be down at a time” regardless of the total replica count.
  4. Coordinate PDBs with replica count — Ensure you have enough replicas spread across multiple nodes so that PDB constraints can be satisfied during node drains.
  5. Test PDBs before AKS upgrades — Verify that PDBs are correctly configured and allow upgrades to proceed. Misconfigured PDBs are a common cause of stuck upgrades.

Summary

In this exercise you learned:

  • The difference between voluntary disruptions (drain, upgrade, delete) and involuntary disruptions (crash, OOM, hardware failure)
  • PDBs only protect against voluntary disruptions
  • minAvailable sets the minimum number of Pods that must remain running; maxUnavailable sets the maximum number that can be down
  • PDBs are critical during AKS node upgrades to maintain application availability
  • How to create, verify, and monitor PDB status using kubectl get pdb
  • The importance of not setting minAvailable equal to the replica count

results matching ""

    No results matching ""