Monitoring Zentinel health, readiness, and upstream status.
Health Endpoints
Liveness Check
The /health endpoint returns 200 OK if Zentinel is running:
curl http://localhost:9090/health
Response:
{"status": "healthy"}
Configure the health route:
routes {
route "health" {
priority 1000
matches {
path "/health"
}
service-type "builtin"
builtin-handler "health"
}
}
Status Endpoint
The /status endpoint returns detailed status:
curl http://localhost:9090/status
Response:
{
"status": "healthy",
"version": "0.1.0",
"uptime_seconds": 86400,
"start_time": "2025-01-15T00:00:00Z",
"config_reload_count": 3,
"last_config_reload": "2025-01-15T12:00:00Z"
}
Upstream Health
Check upstream health status:
curl http://localhost:9090/admin/upstreams
Response:
{
"upstreams": {
"backend": {
"healthy": true,
"targets": [
{
"address": "10.0.1.1:8080",
"healthy": true,
"active_connections": 45,
"total_requests": 150000,
"failed_requests": 12
},
{
"address": "10.0.1.2:8080",
"healthy": true,
"active_connections": 42,
"total_requests": 148000,
"failed_requests": 8
},
{
"address": "10.0.1.3:8080",
"healthy": false,
"active_connections": 0,
"total_requests": 50000,
"failed_requests": 150,
"last_error": "connection refused",
"unhealthy_since": "2025-01-15T11:30:00Z"
}
]
}
}
}
Kubernetes Probes
Liveness Probe
Detect if Zentinel needs restart:
livenessProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Readiness Probe
Detect if Zentinel is ready to receive traffic:
readinessProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
Startup Probe
For slow-starting instances:
startupProbe:
httpGet:
path: /health
port: 9090
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30 # 150 seconds max startup
Complete Kubernetes Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: zentinel
spec:
replicas: 3
selector:
matchLabels:
app: zentinel
template:
metadata:
labels:
app: zentinel
spec:
containers:
- name: zentinel
image: zentinel:latest
ports:
- name: http
containerPort: 8080
- name: admin
containerPort: 9090
livenessProbe:
httpGet:
path: /health
port: admin
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: admin
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
Load Balancer Health Checks
AWS ALB/NLB
Target type: instance or ip
Health check path: /health
Health check port: 9090
Healthy threshold: 2
Unhealthy threshold: 3
Timeout: 5 seconds
Interval: 10 seconds
Success codes: 200
GCP Load Balancer
healthChecks:
- name: zentinel-health
type: HTTP
httpHealthCheck:
port: 9090
requestPath: /health
checkIntervalSec: 10
timeoutSec: 5
healthyThreshold: 2
unhealthyThreshold: 3
HAProxy Backend Check
backend zentinel_backend
option httpchk GET /health
http-check expect status 200
server zentinel1 10.0.1.1:8080 check port 9090
server zentinel2 10.0.1.2:8080 check port 9090
Upstream Health Checks
HTTP Health Check
upstreams {
upstream "backend" {
health-check {
type "http" {
path "/health"
expected-status 200
host "backend.internal"
}
interval-secs 10
timeout-secs 5
healthy-threshold 2
unhealthy-threshold 3
}
}
}
TCP Health Check
For non-HTTP services:
upstreams {
upstream "database" {
health-check {
type "tcp"
interval-secs 5
timeout-secs 2
healthy-threshold 2
unhealthy-threshold 3
}
}
}
gRPC Health Check
upstreams {
upstream "grpc-service" {
health-check {
type "grpc" {
service "grpc.health.v1.Health"
}
interval-secs 10
timeout-secs 5
}
}
}
Inference Health Check
For LLM/AI inference backends, use the inference health check to verify specific models are loaded and available. This goes beyond a simple HTTP 200 check by parsing the /v1/models endpoint response and confirming expected models are present:
upstreams {
upstream "gpu-cluster" {
health-check {
type "inference" {
endpoint "/v1/models"
expected-models "llama-3-70b" "codellama-34b"
}
interval-secs 30
timeout-secs 10
healthy-threshold 2
unhealthy-threshold 3
}
}
}
The inference health check:
- Sends a GET request to the models endpoint (OpenAI-compatible
/v1/modelsor Ollama/api/tags) - Parses the JSON response to extract available model IDs
- Verifies all expected models are present (supports prefix matching for versioned models like
gpt-4matchinggpt-4-turbo) - Marks the backend unhealthy if any expected model is missing
This is particularly useful for GPU backends where models may need time to load after restart, or when running multiple model variants across a cluster.
Health Check Tuning
| Scenario | interval | timeout | healthy | unhealthy |
|---|---|---|---|---|
| Fast failover | 5s | 2s | 2 | 2 |
| Default | 10s | 5s | 2 | 3 |
| Stable (reduce flapping) | 30s | 10s | 3 | 5 |
| Slow backends | 30s | 15s | 2 | 3 |
Monitoring Key Metrics
Request Metrics
# Request rate
rate(zentinel_requests_total[5m])
# Error rate
sum(rate(zentinel_requests_total{status=~"5.."}[5m]))
/ sum(rate(zentinel_requests_total[5m]))
# P99 latency
histogram_quantile(0.99,
rate(zentinel_request_duration_seconds_bucket[5m]))
Upstream Metrics
# Upstream failure rate
sum(rate(zentinel_upstream_failures_total[5m])) by (upstream)
/ sum(rate(zentinel_upstream_attempts_total[5m])) by (upstream)
# Circuit breaker status (1 = open)
zentinel_circuit_breaker_state{component="upstream"}
# Connection pool utilization
(zentinel_connection_pool_size - zentinel_connection_pool_idle)
/ zentinel_connection_pool_size
System Metrics
# Memory usage
zentinel_memory_usage_bytes
# Active connections
zentinel_open_connections
# Active requests
zentinel_active_requests
Alerting
Critical Alerts
groups:
- name: zentinel-critical
rules:
# High error rate
- alert: ZentinelHighErrorRate
expr: |
sum(rate(zentinel_requests_total{status=~"5.."}[5m]))
/ sum(rate(zentinel_requests_total[5m])) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Zentinel error rate above 5%"
# All upstreams unhealthy
- alert: ZentinelNoHealthyUpstreams
expr: |
sum(zentinel_circuit_breaker_state{component="upstream"})
== count(zentinel_circuit_breaker_state{component="upstream"})
for: 1m
labels:
severity: critical
annotations:
summary: "No healthy upstream servers"
# Zentinel down
- alert: ZentinelDown
expr: up{job="zentinel"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Zentinel instance down"
Warning Alerts
groups:
- name: zentinel-warning
rules:
# High latency
- alert: ZentinelHighLatency
expr: |
histogram_quantile(0.99,
rate(zentinel_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency above 1 second"
# Circuit breaker open
- alert: ZentinelCircuitBreakerOpen
expr: zentinel_circuit_breaker_state == 1
for: 2m
labels:
severity: warning
annotations:
summary: "Circuit breaker open for {{ $labels.component }}"
# High memory usage
- alert: ZentinelHighMemory
expr: |
zentinel_memory_usage_bytes
/ on() node_memory_MemTotal_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 80%"
Dashboards
Key Panels
-
Traffic Overview
- Request rate (RPS)
- Error rate (%)
- Active requests
-
Latency
- P50, P95, P99 latency
- Latency by route
-
Upstream Health
- Upstream status (healthy/unhealthy)
- Connection pool utilization
- Circuit breaker states
-
System Resources
- Memory usage
- CPU usage
- Open connections
Grafana Variables
# Datasource
datasource: prometheus
# Variables
- name: instance
query: label_values(zentinel_requests_total, instance)
- name: route
query: label_values(zentinel_requests_total, route)
- name: upstream
query: label_values(zentinel_upstream_attempts_total, upstream)
External Health Monitoring
Synthetic Monitoring
Use external monitors to verify end-to-end health:
# Simple availability check
curl -sf https://api.example.com/health || alert
# Response time check
response_time=$(curl -sf -w "%{time_total}" -o /dev/null https://api.example.com/health)
if (( $(echo "$response_time > 1.0" | bc -l) )); then
alert "Slow response: ${response_time}s"
fi
Recommended Tools
- Uptime monitoring: Pingdom, UptimeRobot, Datadog Synthetics
- APM: Datadog, New Relic, Dynatrace
- Logs: Elasticsearch/Kibana, Loki/Grafana, Splunk
See Also
- Troubleshooting - Diagnosing issues
- Metrics Reference - All available metrics
- Deployment - Production deployment guides