17: Load Balancer Health Probes

17: Load Balancer Health Probes

Objective

Understand the difference between Azure Load Balancer health probes and Kubernetes probes, learn how AKS configures LB health probes automatically, and practice configuring and diagnosing them.


Theory

Azure LB Health Probes vs Kubernetes Probes — They Are Different

This is a critical distinction that often causes confusion:

Aspect Azure LB Health Probes Kubernetes Probes (Liveness/Readiness)
Level Infrastructure (Azure networking) Application (Kubernetes Pod lifecycle)
Purpose Determine if a node should receive LB traffic Determine if a Pod is healthy or ready
Managed by Azure Load Balancer kubelet on each node
Failure action Node removed from LB backend pool Pod restarted (liveness) or removed from endpoints (readiness)
Source IP 168.63.129.16 (Azure health probe) localhost (kubelet)
Configuration Azure LB resource / Service annotations Pod spec (livenessProbe, readinessProbe)

How They Work Together

graph TB
    subgraph Azure["Azure Load Balancer"]
        HP["Health Probe<br/>Source: 168.63.129.16"]
    end

    subgraph Node1["Node 1"]
        KP1["kube-proxy<br/>(Service port)"]
        subgraph Pod1["Pod A"]
            RP1["Readiness Probe<br/>(kubelet)"]
        end
    end

    subgraph Node2["Node 2"]
        KP2["kube-proxy<br/>(Service port)"]
        subgraph Pod2["Pod B"]
            RP2["Readiness Probe<br/>(kubelet)"]
        end
    end

    HP -->|"Probe request"| KP1
    HP -->|"Probe request"| KP2
    RP1 -.->|"kubelet checks"| Pod1
    RP2 -.->|"kubelet checks"| Pod2

    style Azure fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
    style Node1 fill:#e8f5e9,stroke:#388e3c,stroke-width:1px
    style Node2 fill:#e8f5e9,stroke:#388e3c,stroke-width:1px

How AKS Configures LB Health Probes Automatically

When you create a LoadBalancer Service, AKS automatically creates health probes on the Azure Load Balancer:

  1. Default behavior — AKS creates a TCP health probe on the Service port. The probe checks if the port is open on the node (via kube-proxy).
  2. With annotations — You can customize the probe protocol (TCP, HTTP, HTTPS) and path using Service annotations.
  3. Probe interval — Default is every 5 seconds, with 2 consecutive failures required to mark a node as unhealthy.

What Happens When an LB Health Probe Fails

When the Azure LB health probe fails for a node:

  1. The node is removed from the LB backend pool — no new connections are sent to it
  2. Existing connections may be terminated (depending on LB configuration)
  3. Traffic is redistributed to remaining healthy nodes
  4. When the probe succeeds again, the node is added back to the backend pool

Important: This is different from a Kubernetes readiness probe failure, which removes a single Pod from Service endpoints. An LB health probe failure affects the entire node.

NSG Considerations

Azure LB health probes originate from the special IP address 168.63.129.16. This is an Azure platform IP used for health probing and metadata services.

Your Network Security Group (NSG) must allow inbound traffic from 168.63.129.16 on the Service port. AKS typically configures this automatically, but custom NSG rules can inadvertently block health probes.

If health probes are blocked:

  • Azure LB marks all nodes as unhealthy
  • No traffic reaches your Service
  • The Service appears to have an external IP but is unreachable

Practical Tasks

Task 1: Create a Service with Explicit Health Probe Configuration

Create a file called nginx-health-probe.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hp-XX                    # Replace XX with your student number
  namespace: student-XX
  labels:
    app: nginx-hp-XX
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-hp-XX
  template:
    metadata:
      labels:
        app: nginx-hp-XX
    spec:
      containers:
        - name: nginx
          image: nginx:1.27
          ports:
            - containerPort: 80
              name: http
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-hp-XX                    # Replace XX with your student number
  namespace: student-XX
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol: http
    service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /
    service.beta.kubernetes.io/azure-load-balancer-health-probe-interval: "5"
    service.beta.kubernetes.io/azure-load-balancer-health-probe-num-of-probe: "2"
spec:
  type: LoadBalancer
  selector:
    app: nginx-hp-XX
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      name: http

Note: The /healthz path will return a 404 from nginx by default. Azure LB HTTP health probes consider only HTTP 200 responses as healthy — a 404 will cause the probe to fail and mark the node as unhealthy. For this exercise, change the health-probe-request-path to / (which returns 200 from nginx) or use a tcp probe protocol instead. For a production setup, configure your application to have a proper health endpoint that returns 200.

Deploy and wait for the external IP:

kubectl apply -f nginx-health-probe.yaml
kubectl get svc nginx-hp-XX -n student-XX -w

Available Health Probe Annotations

Annotation Description Default
azure-load-balancer-health-probe-protocol Probe protocol: tcp, http, or https tcp
azure-load-balancer-health-probe-request-path HTTP/HTTPS path to probe /
azure-load-balancer-health-probe-interval Probe interval in seconds 5
azure-load-balancer-health-probe-num-of-probe Number of consecutive failures before marking unhealthy 2

Task 2: Verify LB Health Probes in Azure Portal (Instructor Demo)

Note: Azure Portal access is an instructor-led demo. The instructor will show the following steps while participants observe. This is important to understand for production environments, even though you cannot access the portal directly.

Once the Service has an external IP, the instructor navigates to the Azure Portal:

  1. Go to the managed resource group (MC_<resource-group>_<cluster-name>_<region>)
  2. Open the Load Balancer resource (usually named kubernetes)
  3. In the left menu, click Health probes
  4. Find the probe associated with your Service — note the protocol, port, path, and interval
  5. Click Backend pools — verify that your AKS nodes are listed and their health status
  6. Click Load balancing rules — verify the rule mapping your Service port

Task 3: Diagnostics with Azure CLI (Instructor Reference)

Note: The following az network commands require Azure subscription access and are performed by the instructor. They are included here so you understand how to diagnose LB health probe issues in production.

# Get the managed resource group name
az aks show --resource-group <resource-group> --name <cluster-name> --query nodeResourceGroup -o tsv
# List all Load Balancers in the managed resource group
az network lb list --resource-group <MC-resource-group> --output table
# List health probes on the Load Balancer
az network lb probe list --resource-group <MC-resource-group> --lb-name kubernetes --output table
# Show details of a specific probe (replace with actual probe name)
az network lb probe show --resource-group <MC-resource-group> --lb-name kubernetes --name <probe-name> --output json
# List load balancing rules to see which probes are attached
az network lb rule list --resource-group <MC-resource-group> --lb-name kubernetes --output table
# Check NSG rules to verify health probe traffic is allowed
az network nsg rule list --resource-group <MC-resource-group> --nsg-name <nsg-name> --output table

Look for rules that allow inbound traffic from 168.63.129.16.

Clean Up

kubectl delete -f nginx-health-probe.yaml

Common Problems

Problem Possible Cause Solution
LB health probe shows all nodes unhealthy NSG blocks 168.63.129.16 on the Service port Add NSG rule allowing inbound from 168.63.129.16
HTTP health probe fails Wrong probe path (returns non-200 status) Verify the path exists and returns an HTTP 200 response (Azure LB only considers 200 as healthy)
Service reachable sometimes, unreachable other times Some nodes healthy, some not Check per-node health probe status in Azure Portal
External IP assigned but no traffic flows Health probes failing silently Check LB health probe status in Portal under Health probes
Probe path works from inside cluster but not from LB kube-proxy or Pod not listening on the probed port Verify the Service port mapping and that Pods are running

Best Practices

  1. Configure HTTP probes for HTTP services — HTTP probes are more accurate than TCP probes because they verify the application can actually serve requests, not just that the port is open.
  2. Use a dedicated health endpoint — Applications should expose a /healthz or /health endpoint that performs basic health checks.
  3. Do not block 168.63.129.16 — This Azure platform IP is essential for LB health probes. Ensure custom NSG rules do not block it.
  4. Monitor LB health in Azure Portal — Regularly check the health probe status, especially after deploying new Services or changing NSG rules.
  5. Understand the difference between LB and K8s probes — LB probes operate at the node level (Azure infrastructure), while K8s probes operate at the Pod level (application lifecycle). Both must pass for traffic to reach your application.

Summary

In this exercise you learned:

  • Azure LB health probes and Kubernetes probes are fundamentally different — LB probes check nodes, K8s probes check Pods
  • AKS automatically creates LB health probes when you create a LoadBalancer Service
  • LB health probes originate from 168.63.129.16 and must not be blocked by NSG rules
  • You can customize LB health probe behavior using Service annotations (protocol, path, interval)
  • How to verify health probes in the Azure Portal and via az network lb probe list
  • When an LB health probe fails, the entire node is removed from the backend pool

results matching ""

    No results matching ""