17: Load Balancer Health Probes
17: Load Balancer Health Probes
Objective
Understand the difference between Azure Load Balancer health probes and Kubernetes probes, learn how AKS configures LB health probes automatically, and practice configuring and diagnosing them.
Theory
Azure LB Health Probes vs Kubernetes Probes — They Are Different
This is a critical distinction that often causes confusion:
| Aspect | Azure LB Health Probes | Kubernetes Probes (Liveness/Readiness) |
|---|---|---|
| Level | Infrastructure (Azure networking) | Application (Kubernetes Pod lifecycle) |
| Purpose | Determine if a node should receive LB traffic | Determine if a Pod is healthy or ready |
| Managed by | Azure Load Balancer | kubelet on each node |
| Failure action | Node removed from LB backend pool | Pod restarted (liveness) or removed from endpoints (readiness) |
| Source IP | 168.63.129.16 (Azure health probe) |
localhost (kubelet) |
| Configuration | Azure LB resource / Service annotations | Pod spec (livenessProbe, readinessProbe) |
How They Work Together
graph TB
subgraph Azure["Azure Load Balancer"]
HP["Health Probe<br/>Source: 168.63.129.16"]
end
subgraph Node1["Node 1"]
KP1["kube-proxy<br/>(Service port)"]
subgraph Pod1["Pod A"]
RP1["Readiness Probe<br/>(kubelet)"]
end
end
subgraph Node2["Node 2"]
KP2["kube-proxy<br/>(Service port)"]
subgraph Pod2["Pod B"]
RP2["Readiness Probe<br/>(kubelet)"]
end
end
HP -->|"Probe request"| KP1
HP -->|"Probe request"| KP2
RP1 -.->|"kubelet checks"| Pod1
RP2 -.->|"kubelet checks"| Pod2
style Azure fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
style Node1 fill:#e8f5e9,stroke:#388e3c,stroke-width:1px
style Node2 fill:#e8f5e9,stroke:#388e3c,stroke-width:1px
How AKS Configures LB Health Probes Automatically
When you create a LoadBalancer Service, AKS automatically creates health probes on the Azure Load Balancer:
- Default behavior — AKS creates a TCP health probe on the Service port. The probe checks if the port is open on the node (via kube-proxy).
- With annotations — You can customize the probe protocol (TCP, HTTP, HTTPS) and path using Service annotations.
- Probe interval — Default is every 5 seconds, with 2 consecutive failures required to mark a node as unhealthy.
What Happens When an LB Health Probe Fails
When the Azure LB health probe fails for a node:
- The node is removed from the LB backend pool — no new connections are sent to it
- Existing connections may be terminated (depending on LB configuration)
- Traffic is redistributed to remaining healthy nodes
- When the probe succeeds again, the node is added back to the backend pool
Important: This is different from a Kubernetes readiness probe failure, which removes a single Pod from Service endpoints. An LB health probe failure affects the entire node.
NSG Considerations
Azure LB health probes originate from the special IP address 168.63.129.16. This is an Azure platform IP used for health probing and metadata services.
Your Network Security Group (NSG) must allow inbound traffic from 168.63.129.16 on the Service port. AKS typically configures this automatically, but custom NSG rules can inadvertently block health probes.
If health probes are blocked:
- Azure LB marks all nodes as unhealthy
- No traffic reaches your Service
- The Service appears to have an external IP but is unreachable
Practical Tasks
Task 1: Create a Service with Explicit Health Probe Configuration
Create a file called nginx-health-probe.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-hp-XX # Replace XX with your student number
namespace: student-XX
labels:
app: nginx-hp-XX
spec:
replicas: 3
selector:
matchLabels:
app: nginx-hp-XX
template:
metadata:
labels:
app: nginx-hp-XX
spec:
containers:
- name: nginx
image: nginx:1.27
ports:
- containerPort: 80
name: http
---
apiVersion: v1
kind: Service
metadata:
name: nginx-hp-XX # Replace XX with your student number
namespace: student-XX
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol: http
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /
service.beta.kubernetes.io/azure-load-balancer-health-probe-interval: "5"
service.beta.kubernetes.io/azure-load-balancer-health-probe-num-of-probe: "2"
spec:
type: LoadBalancer
selector:
app: nginx-hp-XX
ports:
- protocol: TCP
port: 80
targetPort: 80
name: http
Note: The
/healthzpath will return a 404 from nginx by default. Azure LB HTTP health probes consider only HTTP 200 responses as healthy — a 404 will cause the probe to fail and mark the node as unhealthy. For this exercise, change thehealth-probe-request-pathto/(which returns 200 from nginx) or use atcpprobe protocol instead. For a production setup, configure your application to have a proper health endpoint that returns 200.
Deploy and wait for the external IP:
kubectl apply -f nginx-health-probe.yaml
kubectl get svc nginx-hp-XX -n student-XX -w
Available Health Probe Annotations
| Annotation | Description | Default |
|---|---|---|
azure-load-balancer-health-probe-protocol |
Probe protocol: tcp, http, or https |
tcp |
azure-load-balancer-health-probe-request-path |
HTTP/HTTPS path to probe | / |
azure-load-balancer-health-probe-interval |
Probe interval in seconds | 5 |
azure-load-balancer-health-probe-num-of-probe |
Number of consecutive failures before marking unhealthy | 2 |
Task 2: Verify LB Health Probes in Azure Portal (Instructor Demo)
Note: Azure Portal access is an instructor-led demo. The instructor will show the following steps while participants observe. This is important to understand for production environments, even though you cannot access the portal directly.
Once the Service has an external IP, the instructor navigates to the Azure Portal:
- Go to the managed resource group (
MC_<resource-group>_<cluster-name>_<region>) - Open the Load Balancer resource (usually named
kubernetes) - In the left menu, click Health probes
- Find the probe associated with your Service — note the protocol, port, path, and interval
- Click Backend pools — verify that your AKS nodes are listed and their health status
- Click Load balancing rules — verify the rule mapping your Service port
Task 3: Diagnostics with Azure CLI (Instructor Reference)
Note: The following
az networkcommands require Azure subscription access and are performed by the instructor. They are included here so you understand how to diagnose LB health probe issues in production.
# Get the managed resource group name
az aks show --resource-group <resource-group> --name <cluster-name> --query nodeResourceGroup -o tsv
# List all Load Balancers in the managed resource group
az network lb list --resource-group <MC-resource-group> --output table
# List health probes on the Load Balancer
az network lb probe list --resource-group <MC-resource-group> --lb-name kubernetes --output table
# Show details of a specific probe (replace with actual probe name)
az network lb probe show --resource-group <MC-resource-group> --lb-name kubernetes --name <probe-name> --output json
# List load balancing rules to see which probes are attached
az network lb rule list --resource-group <MC-resource-group> --lb-name kubernetes --output table
# Check NSG rules to verify health probe traffic is allowed
az network nsg rule list --resource-group <MC-resource-group> --nsg-name <nsg-name> --output table
Look for rules that allow inbound traffic from 168.63.129.16.
Clean Up
kubectl delete -f nginx-health-probe.yaml
Common Problems
| Problem | Possible Cause | Solution |
|---|---|---|
| LB health probe shows all nodes unhealthy | NSG blocks 168.63.129.16 on the Service port |
Add NSG rule allowing inbound from 168.63.129.16 |
| HTTP health probe fails | Wrong probe path (returns non-200 status) | Verify the path exists and returns an HTTP 200 response (Azure LB only considers 200 as healthy) |
| Service reachable sometimes, unreachable other times | Some nodes healthy, some not | Check per-node health probe status in Azure Portal |
| External IP assigned but no traffic flows | Health probes failing silently | Check LB health probe status in Portal under Health probes |
| Probe path works from inside cluster but not from LB | kube-proxy or Pod not listening on the probed port | Verify the Service port mapping and that Pods are running |
Best Practices
- Configure HTTP probes for HTTP services — HTTP probes are more accurate than TCP probes because they verify the application can actually serve requests, not just that the port is open.
- Use a dedicated health endpoint — Applications should expose a
/healthzor/healthendpoint that performs basic health checks. - Do not block
168.63.129.16— This Azure platform IP is essential for LB health probes. Ensure custom NSG rules do not block it. - Monitor LB health in Azure Portal — Regularly check the health probe status, especially after deploying new Services or changing NSG rules.
- Understand the difference between LB and K8s probes — LB probes operate at the node level (Azure infrastructure), while K8s probes operate at the Pod level (application lifecycle). Both must pass for traffic to reach your application.
Summary
In this exercise you learned:
- Azure LB health probes and Kubernetes probes are fundamentally different — LB probes check nodes, K8s probes check Pods
- AKS automatically creates LB health probes when you create a LoadBalancer Service
- LB health probes originate from
168.63.129.16and must not be blocked by NSG rules - You can customize LB health probe behavior using Service annotations (protocol, path, interval)
- How to verify health probes in the Azure Portal and via
az network lb probe list - When an LB health probe fails, the entire node is removed from the backend pool