11: Liveness & Readiness Probes
11: Liveness & Readiness Probes
Objective
Learn how to configure liveness and readiness probes to make your applications self-healing and production-ready. Understand what happens when each probe fails and how Kubernetes responds differently to liveness vs readiness failures.
Theory
Why Probes Matter
Kubernetes can detect when a container process crashes, but it cannot detect when an application is stuck (deadlocked, unresponsive) or not ready to serve traffic. Probes give Kubernetes visibility into application health beyond process-level checks.
Probe Types
| Probe | Purpose | On Failure |
|---|---|---|
| Liveness Probe | Checks if the application is alive and not stuck | Kubernetes restarts the container |
| Readiness Probe | Checks if the application is ready to receive traffic | Kubernetes removes the Pod from Service endpoints (no traffic routed to it) |
| Startup Probe | Checks if the application has finished starting up | Disables liveness and readiness probes until it succeeds; on failure, restarts the container |
The startup probe is useful for slow-starting applications where a liveness probe might kill the container before it finishes initializing.
Probe Mechanisms
| Mechanism | How It Works | Use Case |
|---|---|---|
| HTTP GET | Sends an HTTP GET request to a specified path and port. Success = 200-399 status code | Web applications with health endpoints |
| TCP Socket | Attempts to open a TCP connection to a specified port. Success = connection established | Databases, message queues, any TCP service |
| Exec | Runs a command inside the container. Success = exit code 0 | Custom health checks, file existence checks |
Key Probe Parameters
| Parameter | Default | Description |
|---|---|---|
initialDelaySeconds |
0 | Seconds to wait before the first probe |
periodSeconds |
10 | How often to run the probe |
timeoutSeconds |
1 | Seconds before the probe times out |
failureThreshold |
3 | Number of consecutive failures before taking action |
successThreshold |
1 | Number of consecutive successes to be considered healthy (must be 1 for liveness) |
Liveness vs Readiness Failure Behavior
graph TB
subgraph LivenessFlow["Liveness Probe Failure"]
direction TB
L1["Liveness probe fails"] --> L2["failureThreshold reached"]
L2 --> L3["kubelet RESTARTS the container"]
L3 --> L4["Container starts fresh"]
end
subgraph ReadinessFlow["Readiness Probe Failure"]
direction TB
R1["Readiness probe fails"] --> R2["failureThreshold reached"]
R2 --> R3["Pod REMOVED from<br/>Service endpoints"]
R3 --> R4["No traffic routed to Pod<br/>Container keeps running"]
R4 --> R5["When probe succeeds again<br/>Pod re-added to endpoints"]
end
style L3 fill:#ffcdd2,stroke:#c62828,stroke-width:2px
style R3 fill:#fff9c4,stroke:#f9a825,stroke-width:2px
style R5 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px
Practical Tasks
Task 1: Liveness Probe with kuard
Deploy a Pod with a liveness probe that checks the /healthy endpoint. The kuard UI allows you to manually trigger a health check failure.
Create a file called pod-liveness.yaml:
apiVersion: v1
kind: Pod
metadata:
name: kuard-liveness
namespace: student-XX
labels:
app: kuard-liveness
spec:
containers:
- name: kuard
image: <ACR_NAME>.azurecr.io/kuard:1
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
livenessProbe:
httpGet:
path: /healthy
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Deploy and port-forward:
kubectl apply -f pod-liveness.yaml
kubectl port-forward pod/kuard-liveness 8080:8080 -n student-XX
Open http://localhost:8080 in your browser. Navigate to the Liveness Probe tab. Click Fail to make the /healthy endpoint return an error.
Wait approximately 15 seconds (3 failures x 5 seconds period) and observe:
kubectl get pod kuard-liveness -n student-XX -w
You should see the restart count increase:
NAME READY STATUS RESTARTS AGE
kuard-liveness 1/1 Running 1 1m
Check the events:
kubectl describe pod kuard-liveness -n student-XX
Look for events like:
Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing Container kuard failed liveness probe, will be restarted
Task 2: Readiness Probe with kuard
Deploy a Pod with a readiness probe and a Service. When the readiness probe fails, the Pod is removed from the Service endpoints, but the container keeps running.
Create a file called pod-readiness.yaml:
apiVersion: v1
kind: Pod
metadata:
name: kuard-readiness
namespace: student-XX
labels:
app: kuard-readiness
spec:
containers:
- name: kuard
image: <ACR_NAME>.azurecr.io/kuard:1
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: kuard-readiness-svc
namespace: student-XX
spec:
selector:
app: kuard-readiness
ports:
- port: 80
targetPort: 8080
Deploy and check the endpoints:
kubectl apply -f pod-readiness.yaml
kubectl get endpoints kuard-readiness-svc -n student-XX
You should see the Pod IP listed in the endpoints:
NAME ENDPOINTS AGE
kuard-readiness-svc 10.244.0.15:8080 10s
Now port-forward to the Pod and navigate to the Readiness Probe tab. Click Fail to make the /ready endpoint return an error.
kubectl port-forward pod/kuard-readiness 8080:8080 -n student-XX
After the failure threshold is reached, check the endpoints again:
kubectl get endpoints kuard-readiness-svc -n student-XX
The endpoints should now be empty:
NAME ENDPOINTS AGE
kuard-readiness-svc <none> 1m
Also check the Pod status — notice the Pod is still running but shows 0/1 READY:
kubectl get pod kuard-readiness -n student-XX
NAME READY STATUS RESTARTS AGE
kuard-readiness 0/1 Running 0 2m
Go back to the kuard UI and click Succeed on the Readiness Probe tab. The Pod will be re-added to the endpoints.
Task 3: Combined Probes
Deploy a Pod with both liveness and readiness probes, simulating a production-ready configuration.
Create a file called pod-combined-probes.yaml:
apiVersion: v1
kind: Pod
metadata:
name: kuard-probes
namespace: student-XX
labels:
app: kuard-probes
spec:
containers:
- name: kuard
image: <ACR_NAME>.azurecr.io/kuard:1
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "100m"
memory: "64Mi"
livenessProbe:
httpGet:
path: /healthy
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 2
Deploy and verify:
kubectl apply -f pod-combined-probes.yaml
kubectl get pod kuard-probes -n student-XX
Experiment with the kuard UI:
- Fail readiness only — Pod stays running but is removed from endpoints (no restart).
- Fail liveness — Pod is restarted, which also resets the readiness state.
Observe the difference in behavior using:
kubectl describe pod kuard-probes -n student-XX
Clean Up
kubectl delete pod kuard-liveness kuard-readiness kuard-probes -n student-XX
kubectl delete service kuard-readiness-svc -n student-XX
Common Problems
| Problem | Possible Cause | Solution |
|---|---|---|
| Pod keeps restarting | Liveness probe fails before the application is ready | Increase initialDelaySeconds or add a startup probe |
Pod shows 0/1 READY but is running |
Readiness probe is failing | Check the readiness endpoint and application logs |
| Probes pass locally but fail in cluster | Probe targets wrong port or path | Verify containerPort, path, and that the app listens on 0.0.0.0 not 127.0.0.1 |
| Service has no endpoints | All Pods behind the Service have failing readiness probes | Check readiness probe configuration and application health |
| Container restarted but probe failure is intermittent | failureThreshold too low or timeoutSeconds too short |
Increase thresholds for applications with variable response times |
Best Practices
- Always configure both liveness and readiness probes — Liveness handles stuck processes; readiness handles temporary unavailability (e.g., during startup, dependency failures).
- Use different endpoints for liveness and readiness — The liveness check should verify the process is not stuck. The readiness check should verify the application can serve requests (database connected, caches warmed, etc.).
- Do not make liveness probes depend on external services — If your liveness probe checks a database and the database is down, Kubernetes will restart all Pods, making the problem worse.
- Use startup probes for slow-starting applications — Instead of setting a very high
initialDelaySecondson the liveness probe, use a startup probe with a generous timeout. - Tune probe timing for your application — Default values work for many cases, but high-traffic or slow applications may need longer timeouts and higher failure thresholds.
Summary
In this exercise you learned:
- The difference between liveness probes (restart on failure) and readiness probes (remove from endpoints on failure)
- The three probe mechanisms: HTTP GET, TCP Socket, and Exec
- How to configure probes with initialDelaySeconds, periodSeconds, failureThreshold, and other parameters
- How to use kuard to simulate probe failures and observe Kubernetes behavior
- How readiness probe failures remove a Pod from Service endpoints without restarting it
- How to combine both probes in a production-ready configuration