Health Monitoring

Monitoring Zentinel health, readiness, and upstream status.

Health Endpoints

Liveness Check

The /health endpoint returns 200 OK if Zentinel is running:

curl http://localhost:9090/health

Response:

{"status": "healthy"}

Configure the health route:

routes {
    route "health" {
        priority 1000
        matches {
            path "/health"
        }
        service-type "builtin"
        builtin-handler "health"
    }
}

Status Endpoint

The /status endpoint returns detailed status:

curl http://localhost:9090/status

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "uptime_seconds": 86400,
  "start_time": "2025-01-15T00:00:00Z",
  "config_reload_count": 3,
  "last_config_reload": "2025-01-15T12:00:00Z"
}

Upstream Health

Check upstream health status:

curl http://localhost:9090/admin/upstreams

Response:

{
  "upstreams": {
    "backend": {
      "healthy": true,
      "targets": [
        {
          "address": "10.0.1.1:8080",
          "healthy": true,
          "active_connections": 45,
          "total_requests": 150000,
          "failed_requests": 12
        },
        {
          "address": "10.0.1.2:8080",
          "healthy": true,
          "active_connections": 42,
          "total_requests": 148000,
          "failed_requests": 8
        },
        {
          "address": "10.0.1.3:8080",
          "healthy": false,
          "active_connections": 0,
          "total_requests": 50000,
          "failed_requests": 150,
          "last_error": "connection refused",
          "unhealthy_since": "2025-01-15T11:30:00Z"
        }
      ]
    }
  }
}

Kubernetes Probes

Liveness Probe

Detect if Zentinel needs restart:

livenessProbe:
  httpGet:
    path: /health
    port: 9090
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

Readiness Probe

Detect if Zentinel is ready to receive traffic:

readinessProbe:
  httpGet:
    path: /health
    port: 9090
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Startup Probe

For slow-starting instances:

startupProbe:
  httpGet:
    path: /health
    port: 9090
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 30  # 150 seconds max startup

Complete Kubernetes Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zentinel
spec:
  replicas: 3
  selector:
    matchLabels:
      app: zentinel
  template:
    metadata:
      labels:
        app: zentinel
    spec:
      containers:
        - name: zentinel
          image: zentinel:latest
          ports:
            - name: http
              containerPort: 8080
            - name: admin
              containerPort: 9090
          livenessProbe:
            httpGet:
              path: /health
              port: admin
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: admin
            initialDelaySeconds: 5
            periodSeconds: 5
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1000m"

Load Balancer Health Checks

AWS ALB/NLB

Target type: instance or ip
Health check path: /health
Health check port: 9090
Healthy threshold: 2
Unhealthy threshold: 3
Timeout: 5 seconds
Interval: 10 seconds
Success codes: 200

GCP Load Balancer

healthChecks:
  - name: zentinel-health
    type: HTTP
    httpHealthCheck:
      port: 9090
      requestPath: /health
    checkIntervalSec: 10
    timeoutSec: 5
    healthyThreshold: 2
    unhealthyThreshold: 3

HAProxy Backend Check

backend zentinel_backend
    option httpchk GET /health
    http-check expect status 200
    server zentinel1 10.0.1.1:8080 check port 9090
    server zentinel2 10.0.1.2:8080 check port 9090

Upstream Health Checks

HTTP Health Check

upstreams {
    upstream "backend" {
        health-check {
            type "http" {
                path "/health"
                expected-status 200
                host "backend.internal"
            }
            interval-secs 10
            timeout-secs 5
            healthy-threshold 2
            unhealthy-threshold 3
        }
    }
}

TCP Health Check

For non-HTTP services:

upstreams {
    upstream "database" {
        health-check {
            type "tcp"
            interval-secs 5
            timeout-secs 2
            healthy-threshold 2
            unhealthy-threshold 3
        }
    }
}

gRPC Health Check

upstreams {
    upstream "grpc-service" {
        health-check {
            type "grpc" {
                service "grpc.health.v1.Health"
            }
            interval-secs 10
            timeout-secs 5
        }
    }
}

Inference Health Check

For LLM/AI inference backends, use the inference health check to verify specific models are loaded and available. This goes beyond a simple HTTP 200 check by parsing the /v1/models endpoint response and confirming expected models are present:

upstreams {
    upstream "gpu-cluster" {
        health-check {
            type "inference" {
                endpoint "/v1/models"
                expected-models "llama-3-70b" "codellama-34b"
            }
            interval-secs 30
            timeout-secs 10
            healthy-threshold 2
            unhealthy-threshold 3
        }
    }
}

The inference health check:

Sends a GET request to the models endpoint (OpenAI-compatible /v1/models or Ollama /api/tags)
Parses the JSON response to extract available model IDs
Verifies all expected models are present (supports prefix matching for versioned models like gpt-4 matching gpt-4-turbo)
Marks the backend unhealthy if any expected model is missing

This is particularly useful for GPU backends where models may need time to load after restart, or when running multiple model variants across a cluster.

Health Check Tuning

Scenario	interval	timeout	healthy	unhealthy
Fast failover	5s	2s	2	2
Default	10s	5s	2	3
Stable (reduce flapping)	30s	10s	3	5
Slow backends	30s	15s	2	3

Monitoring Key Metrics

Request Metrics

# Request rate
rate(zentinel_requests_total[5m])

# Error rate
sum(rate(zentinel_requests_total{status=~"5.."}[5m]))
  / sum(rate(zentinel_requests_total[5m]))

# P99 latency
histogram_quantile(0.99,
  rate(zentinel_request_duration_seconds_bucket[5m]))

Upstream Metrics

# Upstream failure rate
sum(rate(zentinel_upstream_failures_total[5m])) by (upstream)
  / sum(rate(zentinel_upstream_attempts_total[5m])) by (upstream)

# Circuit breaker status (1 = open)
zentinel_circuit_breaker_state{component="upstream"}

# Connection pool utilization
(zentinel_connection_pool_size - zentinel_connection_pool_idle)
  / zentinel_connection_pool_size

System Metrics

# Memory usage
zentinel_memory_usage_bytes

# Active connections
zentinel_open_connections

# Active requests
zentinel_active_requests

Alerting

Critical Alerts

groups:
  - name: zentinel-critical
    rules:
      # High error rate
      - alert: ZentinelHighErrorRate
        expr: |
          sum(rate(zentinel_requests_total{status=~"5.."}[5m]))
          / sum(rate(zentinel_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Zentinel error rate above 5%"

      # All upstreams unhealthy
      - alert: ZentinelNoHealthyUpstreams
        expr: |
          sum(zentinel_circuit_breaker_state{component="upstream"})
          == count(zentinel_circuit_breaker_state{component="upstream"})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No healthy upstream servers"

      # Zentinel down
      - alert: ZentinelDown
        expr: up{job="zentinel"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Zentinel instance down"

Warning Alerts

groups:
  - name: zentinel-warning
    rules:
      # High latency
      - alert: ZentinelHighLatency
        expr: |
          histogram_quantile(0.99,
            rate(zentinel_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 1 second"

      # Circuit breaker open
      - alert: ZentinelCircuitBreakerOpen
        expr: zentinel_circuit_breaker_state == 1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Circuit breaker open for {{ $labels.component }}"

      # High memory usage
      - alert: ZentinelHighMemory
        expr: |
          zentinel_memory_usage_bytes
          / on() node_memory_MemTotal_bytes > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage above 80%"

Dashboards

Key Panels

Traffic Overview
- Request rate (RPS)
- Error rate (%)
- Active requests
Latency
- P50, P95, P99 latency
- Latency by route
Upstream Health
- Upstream status (healthy/unhealthy)
- Connection pool utilization
- Circuit breaker states
System Resources
- Memory usage
- CPU usage
- Open connections

Grafana Variables

# Datasource
datasource: prometheus

# Variables
- name: instance
  query: label_values(zentinel_requests_total, instance)

- name: route
  query: label_values(zentinel_requests_total, route)

- name: upstream
  query: label_values(zentinel_upstream_attempts_total, upstream)

External Health Monitoring

Synthetic Monitoring

Use external monitors to verify end-to-end health:

# Simple availability check
curl -sf https://api.example.com/health || alert

# Response time check
response_time=$(curl -sf -w "%{time_total}" -o /dev/null https://api.example.com/health)
if (( $(echo "$response_time > 1.0" | bc -l) )); then
  alert "Slow response: ${response_time}s"
fi

Recommended Tools

Uptime monitoring: Pingdom, UptimeRobot, Datadog Synthetics
APM: Datadog, New Relic, Dynatrace
Logs: Elasticsearch/Kibana, Loki/Grafana, Splunk

Health Endpoints

Liveness Check

Status Endpoint

Upstream Health

Kubernetes Probes

Liveness Probe

Readiness Probe

Startup Probe

Complete Kubernetes Example

Load Balancer Health Checks

AWS ALB/NLB

GCP Load Balancer

HAProxy Backend Check

Upstream Health Checks

HTTP Health Check

TCP Health Check

gRPC Health Check

Inference Health Check

Health Check Tuning

Monitoring Key Metrics

Request Metrics

Upstream Metrics

System Metrics

Alerting

Critical Alerts

Warning Alerts

Dashboards

Key Panels

Grafana Variables

External Health Monitoring

Synthetic Monitoring

Recommended Tools

See Also