Background
- During load testing in a Kubernetes environment, intermittent 502/504 errors were observed.
- Pods were terminated before they could complete serving responses → 504 Gateway Timeout
- New Pods were created and started serving traffic before they were fully ready → 502 Bad Gateway
Rolling updates proceed smoothly only if Readiness Probes are correctly configured.
Setup
Load Testing Tools
- bombardier: Simple Go CLI load testing tool
- vegeta: Flexible script-based HTTP load testing
Readiness Probe
Mechanism to determine if a Pod is ready to accept traffic
If a container is not ready (e.g. app not fully initialized), it should not receive traffic.
Without a readiness probe, incoming traffic may reach Pods before the application is ready, resulting in 502 errors.
Example Deployment Snippet
readinessProbe:
httpGet:
port: 8080
path: /alive
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 30
Test Output
bombardier -c 200 -d 3m -l https://{endpoint}
Result (simplified):
HTTP codes:
4xx - 753060, 5xx - 12
5XX errors still present.
lifecycle & preStop Hook
Used to execute a shutdown script before the container terminates.
This enables graceful shutdown: disconnect service → finish pending requests → terminate.
Example Deployment Snippet
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 40
This introduces a 40s delay before actual container shutdown.
Test Output
bombardier -c 200 -d 3m -l https://{endpoint}
Result (simplified):
HTTP codes:
2xx - 751239, 5xx - 3
Still not perfect.
terminationGracePeriodSeconds
Time Kubernetes waits for a Pod to shut down before forcefully terminating with SIGKILL.
Default is 30 seconds, which may be shorter than your preStop delay.
Example Deployment Snippet
terminationGracePeriodSeconds: 50
Ensure the following relationship:
preStop (40s)
<terminationGracePeriodSeconds (50s)
<ALB timeout (60s)
Test Output
bombardier -c 200 -d 3m -l https://{endpoint}
Result:
HTTP codes:
2xx - 770240, 5xx - 0
🎉 No more 5xx errors!