11: Liveness & Readiness Probes

11: Liveness & Readiness Probes

Objective

Learn how to configure liveness and readiness probes to make your applications self-healing and production-ready. Understand what happens when each probe fails and how Kubernetes responds differently to liveness vs readiness failures.


Theory

Why Probes Matter

Kubernetes can detect when a container process crashes, but it cannot detect when an application is stuck (deadlocked, unresponsive) or not ready to serve traffic. Probes give Kubernetes visibility into application health beyond process-level checks.

Probe Types

Probe Purpose On Failure
Liveness Probe Checks if the application is alive and not stuck Kubernetes restarts the container
Readiness Probe Checks if the application is ready to receive traffic Kubernetes removes the Pod from Service endpoints (no traffic routed to it)
Startup Probe Checks if the application has finished starting up Disables liveness and readiness probes until it succeeds; on failure, restarts the container

The startup probe is useful for slow-starting applications where a liveness probe might kill the container before it finishes initializing.

Probe Mechanisms

Mechanism How It Works Use Case
HTTP GET Sends an HTTP GET request to a specified path and port. Success = 200-399 status code Web applications with health endpoints
TCP Socket Attempts to open a TCP connection to a specified port. Success = connection established Databases, message queues, any TCP service
Exec Runs a command inside the container. Success = exit code 0 Custom health checks, file existence checks

Key Probe Parameters

Parameter Default Description
initialDelaySeconds 0 Seconds to wait before the first probe
periodSeconds 10 How often to run the probe
timeoutSeconds 1 Seconds before the probe times out
failureThreshold 3 Number of consecutive failures before taking action
successThreshold 1 Number of consecutive successes to be considered healthy (must be 1 for liveness)

Liveness vs Readiness Failure Behavior

graph TB
    subgraph LivenessFlow["Liveness Probe Failure"]
        direction TB
        L1["Liveness probe fails"] --> L2["failureThreshold reached"]
        L2 --> L3["kubelet RESTARTS the container"]
        L3 --> L4["Container starts fresh"]
    end

    subgraph ReadinessFlow["Readiness Probe Failure"]
        direction TB
        R1["Readiness probe fails"] --> R2["failureThreshold reached"]
        R2 --> R3["Pod REMOVED from<br/>Service endpoints"]
        R3 --> R4["No traffic routed to Pod<br/>Container keeps running"]
        R4 --> R5["When probe succeeds again<br/>Pod re-added to endpoints"]
    end

    style L3 fill:#ffcdd2,stroke:#c62828,stroke-width:2px
    style R3 fill:#fff9c4,stroke:#f9a825,stroke-width:2px
    style R5 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px

Practical Tasks

Task 1: Liveness Probe with kuard

Deploy a Pod with a liveness probe that checks the /healthy endpoint. The kuard UI allows you to manually trigger a health check failure.

Create a file called pod-liveness.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: kuard-liveness
  namespace: student-XX
  labels:
    app: kuard-liveness
spec:
  containers:
    - name: kuard
      image: <ACR_NAME>.azurecr.io/kuard:1
      ports:
        - containerPort: 8080
      resources:
        requests:
          cpu: "100m"
          memory: "64Mi"
        limits:
          cpu: "100m"
          memory: "64Mi"
      livenessProbe:
        httpGet:
          path: /healthy
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 3

Deploy and port-forward:

kubectl apply -f pod-liveness.yaml
kubectl port-forward pod/kuard-liveness 8080:8080 -n student-XX

Open http://localhost:8080 in your browser. Navigate to the Liveness Probe tab. Click Fail to make the /healthy endpoint return an error.

Wait approximately 15 seconds (3 failures x 5 seconds period) and observe:

kubectl get pod kuard-liveness -n student-XX -w

You should see the restart count increase:

NAME              READY   STATUS    RESTARTS   AGE
kuard-liveness    1/1     Running   1          1m

Check the events:

kubectl describe pod kuard-liveness -n student-XX

Look for events like:

Warning  Unhealthy  Liveness probe failed: HTTP probe failed with statuscode: 500
Normal   Killing    Container kuard failed liveness probe, will be restarted

Task 2: Readiness Probe with kuard

Deploy a Pod with a readiness probe and a Service. When the readiness probe fails, the Pod is removed from the Service endpoints, but the container keeps running.

Create a file called pod-readiness.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: kuard-readiness
  namespace: student-XX
  labels:
    app: kuard-readiness
spec:
  containers:
    - name: kuard
      image: <ACR_NAME>.azurecr.io/kuard:1
      ports:
        - containerPort: 8080
      resources:
        requests:
          cpu: "100m"
          memory: "64Mi"
        limits:
          cpu: "100m"
          memory: "64Mi"
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
  name: kuard-readiness-svc
  namespace: student-XX
spec:
  selector:
    app: kuard-readiness
  ports:
    - port: 80
      targetPort: 8080

Deploy and check the endpoints:

kubectl apply -f pod-readiness.yaml
kubectl get endpoints kuard-readiness-svc -n student-XX

You should see the Pod IP listed in the endpoints:

NAME                  ENDPOINTS         AGE
kuard-readiness-svc   10.244.0.15:8080  10s

Now port-forward to the Pod and navigate to the Readiness Probe tab. Click Fail to make the /ready endpoint return an error.

kubectl port-forward pod/kuard-readiness 8080:8080 -n student-XX

After the failure threshold is reached, check the endpoints again:

kubectl get endpoints kuard-readiness-svc -n student-XX

The endpoints should now be empty:

NAME                  ENDPOINTS   AGE
kuard-readiness-svc   <none>      1m

Also check the Pod status — notice the Pod is still running but shows 0/1 READY:

kubectl get pod kuard-readiness -n student-XX
NAME               READY   STATUS    RESTARTS   AGE
kuard-readiness    0/1     Running   0          2m

Go back to the kuard UI and click Succeed on the Readiness Probe tab. The Pod will be re-added to the endpoints.


Task 3: Combined Probes

Deploy a Pod with both liveness and readiness probes, simulating a production-ready configuration.

Create a file called pod-combined-probes.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: kuard-probes
  namespace: student-XX
  labels:
    app: kuard-probes
spec:
  containers:
    - name: kuard
      image: <ACR_NAME>.azurecr.io/kuard:1
      ports:
        - containerPort: 8080
      resources:
        requests:
          cpu: "100m"
          memory: "64Mi"
        limits:
          cpu: "100m"
          memory: "64Mi"
      livenessProbe:
        httpGet:
          path: /healthy
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 3
        successThreshold: 2

Deploy and verify:

kubectl apply -f pod-combined-probes.yaml
kubectl get pod kuard-probes -n student-XX

Experiment with the kuard UI:

  1. Fail readiness only — Pod stays running but is removed from endpoints (no restart).
  2. Fail liveness — Pod is restarted, which also resets the readiness state.

Observe the difference in behavior using:

kubectl describe pod kuard-probes -n student-XX

Clean Up

kubectl delete pod kuard-liveness kuard-readiness kuard-probes -n student-XX
kubectl delete service kuard-readiness-svc -n student-XX

Common Problems

Problem Possible Cause Solution
Pod keeps restarting Liveness probe fails before the application is ready Increase initialDelaySeconds or add a startup probe
Pod shows 0/1 READY but is running Readiness probe is failing Check the readiness endpoint and application logs
Probes pass locally but fail in cluster Probe targets wrong port or path Verify containerPort, path, and that the app listens on 0.0.0.0 not 127.0.0.1
Service has no endpoints All Pods behind the Service have failing readiness probes Check readiness probe configuration and application health
Container restarted but probe failure is intermittent failureThreshold too low or timeoutSeconds too short Increase thresholds for applications with variable response times

Best Practices

  1. Always configure both liveness and readiness probes — Liveness handles stuck processes; readiness handles temporary unavailability (e.g., during startup, dependency failures).
  2. Use different endpoints for liveness and readiness — The liveness check should verify the process is not stuck. The readiness check should verify the application can serve requests (database connected, caches warmed, etc.).
  3. Do not make liveness probes depend on external services — If your liveness probe checks a database and the database is down, Kubernetes will restart all Pods, making the problem worse.
  4. Use startup probes for slow-starting applications — Instead of setting a very high initialDelaySeconds on the liveness probe, use a startup probe with a generous timeout.
  5. Tune probe timing for your application — Default values work for many cases, but high-traffic or slow applications may need longer timeouts and higher failure thresholds.

Summary

In this exercise you learned:

  • The difference between liveness probes (restart on failure) and readiness probes (remove from endpoints on failure)
  • The three probe mechanisms: HTTP GET, TCP Socket, and Exec
  • How to configure probes with initialDelaySeconds, periodSeconds, failureThreshold, and other parameters
  • How to use kuard to simulate probe failures and observe Kubernetes behavior
  • How readiness probe failures remove a Pod from Service endpoints without restarting it
  • How to combine both probes in a production-ready configuration

results matching ""

    No results matching ""