Zentinel provides comprehensive observability through metrics, logging, and distributed tracing. All observability features are configured in the observability block.
Basic Configuration
observability {
logging {
level "info"
format "json"
access-log {
enabled #true
file "/var/log/zentinel/access.log"
format "json"
}
error-log {
enabled #true
file "/var/log/zentinel/error.log"
level "warn"
}
audit-log {
enabled #true
file "/var/log/zentinel/audit.log"
log-blocked #true
log-agent-decisions #true
log-waf-events #true
}
}
metrics {
enabled #true
address "0.0.0.0:9090"
path "/metrics"
}
tracing {
backend "otlp" {
endpoint "http://jaeger:4317"
}
sampling-rate 0.01
service-name "zentinel"
}
}
Logging
Application Logging
Configure the main application log output:
observability {
logging {
level "info" // Log level
format "json" // Log format
timestamps #true // Include timestamps
file "/var/log/zentinel/app.log" // Optional file path
}
}
Log Levels
| Level | Description |
|---|---|
trace | Very detailed debugging |
debug | Debugging information |
info | Informational messages (default) |
warn | Warnings |
error | Errors only |
Log Formats
| Format | Description |
|---|---|
json | Structured JSON (default, recommended for production) |
pretty | Human-readable format |
Access Log
HTTP request/response logging:
observability {
logging {
access-log {
enabled #true
file "/var/log/zentinel/access.log"
format "json"
buffer-size 8192
include-trace-id #true
}
}
}
Access Log Options
| Option | Default | Description |
|---|---|---|
enabled | true | Enable access logging |
file | /var/log/zentinel/access.log | Log file path |
format | json | Log format (json, combined, custom) |
buffer-size | 8192 | Write buffer size |
include-trace-id | true | Include trace ID in logs |
Access Log Fields (JSON format)
{
"timestamp": "2024-01-15T10:30:45.123Z",
"trace_id": "2Kj8mNpQ3xR",
"method": "GET",
"path": "/api/v1/users",
"status": 200,
"duration_ms": 45,
"bytes_sent": 1234,
"client_ip": "192.168.1.100",
"user_agent": "Mozilla/5.0...",
"upstream": "api-backend",
"upstream_addr": "10.0.1.5:8080",
"upstream_duration_ms": 42,
"cache_status": "HIT",
"route_id": "api-users"
}
Error Log
Error and warning logging. Error logging is enabled by default — even without explicit configuration, Zentinel writes errors and warnings to /var/log/zentinel/error.log. The directory is created automatically if it doesn’t exist.
To customize:
observability {
logging {
error-log {
enabled #true
file "/var/log/zentinel/error.log"
level "warn"
buffer-size 8192
}
}
}
To disable error file logging:
observability {
logging {
error-log {
enabled #false
}
}
}
Error Log Options
| Option | Default | Description |
|---|---|---|
enabled | true | Enable error logging |
file | /var/log/zentinel/error.log | Log file path |
level | warn | Minimum level to log (warn or error) |
buffer-size | 8192 | Write buffer size |
Audit Log
Security-focused logging for compliance and forensics:
observability {
logging {
audit-log {
enabled #true
file "/var/log/zentinel/audit.log"
buffer-size 8192
log-blocked #true
log-agent-decisions #true
log-waf-events #true
}
}
}
Audit Log Options
| Option | Default | Description |
|---|---|---|
enabled | true | Enable audit logging |
file | /var/log/zentinel/audit.log | Log file path |
buffer-size | 8192 | Write buffer size |
log-blocked | true | Log blocked requests |
log-agent-decisions | true | Log agent allow/deny decisions |
log-waf-events | true | Log WAF rule matches |
Audit Log Events
{
"timestamp": "2024-01-15T10:30:45.123Z",
"event_type": "request_blocked",
"trace_id": "2Kj8mNpQ3xR",
"client_ip": "192.168.1.100",
"method": "POST",
"path": "/api/v1/admin",
"reason": "rate_limit_exceeded",
"rule_id": "rate-limit-api",
"action": "block",
"metadata": {
"limit": 100,
"current": 101
}
}
Metrics
Prometheus-compatible metrics endpoint:
observability {
metrics {
enabled #true
address "0.0.0.0:9090"
path "/metrics"
high-cardinality #false
}
}
Metrics Options
| Option | Default | Description |
|---|---|---|
enabled | true | Enable metrics endpoint |
address | 0.0.0.0:9090 | Metrics server address |
path | /metrics | Metrics endpoint path |
high-cardinality | false | Include high-cardinality labels |
Available Metrics
Request Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_requests_total | Counter | Total requests by route, method, status |
zentinel_request_duration_seconds | Histogram | Request latency distribution |
zentinel_request_size_bytes | Histogram | Request body size |
zentinel_response_size_bytes | Histogram | Response body size |
zentinel_active_requests | Gauge | Currently active requests |
Upstream Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_upstream_requests_total | Counter | Requests to upstreams |
zentinel_upstream_duration_seconds | Histogram | Upstream latency |
zentinel_upstream_healthy_backends | Gauge | Healthy backends per upstream |
zentinel_upstream_connections | Gauge | Active upstream connections |
zentinel_upstream_retries_total | Counter | Retry attempts |
Cache Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_cache_hits_total | Counter | Cache hits |
zentinel_cache_misses_total | Counter | Cache misses |
zentinel_cache_size_bytes | Gauge | Current cache size |
zentinel_cache_entries | Gauge | Number of cached entries |
zentinel_cache_evictions_total | Counter | Cache evictions |
Rate Limiting Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_rate_limit_hits_total | Counter | Rate limit triggers |
zentinel_rate_limit_allowed_total | Counter | Allowed requests |
zentinel_rate_limit_delayed_total | Counter | Delayed requests |
Agent Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_agent_requests_total | Counter | Agent call count |
zentinel_agent_duration_seconds | Histogram | Agent call latency |
zentinel_agent_errors_total | Counter | Agent errors |
zentinel_agent_timeouts_total | Counter | Agent timeouts |
zentinel_agent_circuit_breaker_state | Gauge | Circuit breaker state (0=closed, 1=open, 2=half-open) |
Connection Metrics
| Metric | Type | Description |
|---|---|---|
zentinel_connections_total | Counter | Total connections |
zentinel_active_connections | Gauge | Current connections |
zentinel_connection_duration_seconds | Histogram | Connection lifetime |
zentinel_tls_handshake_duration_seconds | Histogram | TLS handshake time |
Prometheus Scrape Config
scrape_configs:
- job_name: 'zentinel'
static_configs:
- targets: ['zentinel:9090']
scrape_interval: 15s
metrics_path: /metrics
Distributed Tracing
Zentinel provides OpenTelemetry-compatible distributed tracing for end-to-end visibility across your services. Traces show the complete request journey through the proxy, including agent processing and upstream calls.
Note: Tracing requires building Zentinel with the
opentelemetryfeature flag:cargo build --release --features opentelemetry
Basic Configuration
observability {
tracing {
backend "otlp" {
endpoint "http://jaeger:4317"
}
sampling-rate 0.1 // 10% of requests
service-name "zentinel"
}
}
Tracing Backends
OTLP (OpenTelemetry Protocol) - Recommended
The OpenTelemetry Protocol (OTLP) is the standard for sending telemetry data. Use this with Jaeger, Tempo, or any OTLP-compatible backend:
tracing {
backend "otlp" {
endpoint "http://otel-collector:4317" // gRPC endpoint
}
sampling-rate 0.1
service-name "zentinel-prod"
}
Jaeger (Direct)
tracing {
backend "jaeger" {
endpoint "http://jaeger:14268/api/traces"
}
}
Zipkin
tracing {
backend "zipkin" {
endpoint "http://zipkin:9411/api/v2/spans"
}
}
Tracing Options
| Option | Default | Description |
|---|---|---|
sampling-rate | 0.01 | Fraction of requests to trace (0.0 to 1.0) |
service-name | zentinel | Service name shown in trace UI |
Sampling Rate Guidelines
| Environment | Recommended Rate | Notes |
|---|---|---|
| Development | 1.0 | Trace all requests |
| Staging | 0.1 - 0.5 | 10-50% sampling |
| Production (low traffic) | 0.05 - 0.1 | 5-10% sampling |
| Production (high traffic) | 0.01 - 0.05 | 1-5% sampling |
Trace Propagation
W3C Trace Context (Standard)
Zentinel implements W3C Trace Context for distributed tracing propagation:
| Header | Format | Description |
|---|---|---|
traceparent | {version}-{trace-id}-{parent-id}-{flags} | Trace context parent |
tracestate | Vendor-specific key-value pairs | Trace context state |
Example traceparent header:
00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
│ │ │ │
│ │ │ └─ Flags (01 = sampled)
│ │ └─ Parent Span ID (16 hex chars)
│ └─ Trace ID (32 hex chars)
└─ Version (00)
Upstream Propagation
When proxying to upstream services, Zentinel:
- Parses incoming
traceparentheader (if present) - Creates a child span for the request
- Propagates
traceparentwith the new span ID to upstream
This enables end-to-end tracing across your service mesh.
Agent Propagation
Agents receive the traceparent in the RequestMetadata:
{
"metadata": {
"correlation_id": "2Kj8mNpQ3xR",
"traceparent": "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01",
...
}
}
Agents can use this to create child spans for their processing, enabling visibility into agent latency in your traces.
Request Span Lifecycle
Each request creates a span with the following lifecycle:
- Start: Span created when request headers are received
- Upstream:
traceparentpropagated to upstream service - Response: Status code recorded on span
- End: Span completed when response is sent to client
Span Attributes
Each request span includes semantic convention attributes:
| Attribute | Description |
|---|---|
http.method | HTTP method (GET, POST, etc.) |
http.target | Request path |
http.status_code | Response status code |
service.name | Configured service name |
Testing with Jaeger
Quick start with Jaeger all-in-one:
# Start Jaeger
docker run -d --name jaeger \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
# Run Zentinel with tracing
cargo run --features opentelemetry -- --config zentinel.kdl
# View traces
open http://localhost:16686
Testing with Grafana Tempo
For production-grade tracing with Grafana:
# docker-compose.yml
services:
tempo:
image: grafana/tempo:latest
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
ports:
- "4317:4317" # OTLP gRPC
- "3200:3200" # Tempo API
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
Configure Zentinel:
tracing {
backend "otlp" {
endpoint "http://tempo:4317"
}
sampling-rate 0.1
service-name "zentinel"
}
Connecting Logs and Traces
Include trace IDs in access logs for log-to-trace correlation:
observability {
logging {
access-log {
enabled #true
format "json"
include-trace-id #true // Adds trace_id to log entries
}
}
tracing {
backend "otlp" { endpoint "http://tempo:4317" }
sampling-rate 0.1
service-name "zentinel"
}
}
Log entries will include the trace ID:
{
"timestamp": "2024-01-15T10:30:45.123Z",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"method": "GET",
"path": "/api/users",
"status": 200
}
In Grafana, you can then jump from logs to traces using the trace ID.
Trace ID Format
Configure the format for request trace IDs:
system {
trace-id-format "tinyflake" // or "uuid"
}
| Format | Example | Description |
|---|---|---|
tinyflake | 2Kj8mNpQ3xR | 11-char Base58, operator-friendly (default) |
uuid | 550e8400-e29b-41d4-a716-446655440000 | 36-char UUID v4 |
Complete Example
system {
worker-threads 0
trace-id-format "tinyflake"
}
observability {
logging {
level "info"
format "json"
access-log {
enabled #true
file "/var/log/zentinel/access.log"
format "json"
include-trace-id #true
}
error-log {
enabled #true
file "/var/log/zentinel/error.log"
level "warn"
}
audit-log {
enabled #true
file "/var/log/zentinel/audit.log"
log-blocked #true
log-agent-decisions #true
}
}
metrics {
enabled #true
address "0.0.0.0:9090"
path "/metrics"
}
tracing {
backend "otlp" {
endpoint "http://otel-collector:4317"
}
sampling-rate 0.05
service-name "zentinel-prod"
}
}
Log Rotation
Zentinel logs are designed for external rotation. Use logrotate or similar:
/var/log/zentinel/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
copytruncate
}
Best Practices
- Use JSON logging in production: Enables log aggregation and analysis
- Set appropriate log levels:
infofor production,debugfor troubleshooting - Enable audit logging: Required for security compliance
- Configure sampling for tracing: 1-5% is typical for production
- Use separate log files: Easier rotation and analysis
- Monitor metrics endpoints: Set up alerting on error rates and latencies
Next Steps
- Operations - Operational procedures
- Monitoring - Monitoring setup
- Troubleshooting - Debug procedures