Complete distributed tracing setup with Jaeger or Grafana Tempo for end-to-end request visibility.
Use Case
- Trace requests through Zentinel to upstream services
- Debug latency issues across service boundaries
- Correlate logs with traces for faster troubleshooting
- Monitor agent processing time in traces
Prerequisites
Build Zentinel with the OpenTelemetry feature:
cargo build --release --features opentelemetry
Or if using Docker, ensure your image is built with the feature enabled.
Quick Start with Jaeger
1. Start Jaeger
docker run -d --name jaeger \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
2. Configure Zentinel
Create zentinel.kdl:
// Distributed Tracing Configuration
// Traces all requests to Jaeger
system {
worker-threads 0
trace-id-format "tinyflake"
}
listeners {
listener "http" {
address "0.0.0.0:8080"
protocol "http"
}
}
routes {
route "api" {
priority 100
matches {
path-prefix "/api/"
}
upstream "api-backend"
}
route "health" {
priority 1000
matches { path "/health" }
service-type "builtin"
builtin-handler "health"
}
}
upstreams {
upstream "api-backend" {
target "127.0.0.1:3000"
}
}
observability {
tracing {
backend "otlp" {
endpoint "http://localhost:4317"
}
sampling-rate 1.0 // 100% for testing
service-name "zentinel"
}
logging {
level "info"
format "json"
access-log {
enabled #true
include-trace-id #true
}
}
metrics {
enabled #true
address "0.0.0.0:9090"
}
}
3. Start Zentinel
./target/release/zentinel --config zentinel.kdl
4. Generate Traffic
# Make some requests
curl http://localhost:8080/api/users
curl http://localhost:8080/api/products
curl -X POST http://localhost:8080/api/orders -d '{"item": "widget"}'
5. View Traces
Open Jaeger UI: http://localhost:16686
- Select “zentinel” from the Service dropdown
- Click “Find Traces”
- Click on a trace to see the full request timeline
Production Setup with Grafana Tempo
For production, use Grafana Tempo with Grafana for visualization:
docker-compose.yml
version: '3.8'
services:
zentinel:
image: ghcr.io/zentinelproxy/zentinel:latest-otel
ports:
- "8080:8080"
- "9090:9090"
volumes:
- ./zentinel.kdl:/etc/zentinel/zentinel.kdl
command: ["--config", "/etc/zentinel/zentinel.kdl"]
depends_on:
- tempo
tempo:
image: grafana/tempo:2.3.0
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
- tempo-data:/var/tempo
ports:
- "4317:4317" # OTLP gRPC
- "3200:3200" # Tempo API
grafana:
image: grafana/grafana:10.2.0
ports:
- "3000:3000"
volumes:
- ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
depends_on:
- tempo
# Example backend service (traces its own spans)
api-backend:
image: your-api:latest
ports:
- "3001:3000"
environment:
- OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317
- OTEL_SERVICE_NAME=api-backend
volumes:
tempo-data:
tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
ingester:
trace_idle_period: 10s
max_block_bytes: 1_000_000
max_block_duration: 5m
compactor:
compaction:
block_retention: 48h
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
grafana-datasources.yaml
apiVersion: 1
datasources:
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
isDefault: true
zentinel.kdl (for Tempo)
system {
worker-threads 0
trace-id-format "tinyflake"
}
listeners {
listener "http" {
address "0.0.0.0:8080"
protocol "http"
}
}
routes {
route "api" {
priority 100
matches {
path-prefix "/api/"
}
upstream "api-backend"
agents "auth" "ratelimit"
}
route "health" {
priority 1000
matches { path "/health" }
service-type "builtin"
builtin-handler "health"
}
}
upstreams {
upstream "api-backend" {
target "api-backend:3000"
health-check {
type "http" { path "/health" }
interval-secs 10
}
}
}
agents {
agent "auth" {
unix-socket path="/var/run/zentinel/auth.sock"
events "request_headers"
timeout-ms 50
}
agent "ratelimit" {
unix-socket path="/var/run/zentinel/ratelimit.sock"
events "request_headers"
timeout-ms 20
}
}
observability {
tracing {
backend "otlp" {
endpoint "http://tempo:4317"
}
sampling-rate 0.1 // 10% in production
service-name "zentinel"
}
logging {
level "info"
format "json"
access-log {
enabled #true
include-trace-id #true
}
}
metrics {
enabled #true
address "0.0.0.0:9090"
}
}
Tracing with Agents
Agents receive the traceparent header in request metadata, enabling them to create child spans:
Agent Trace Context
When an agent receives a request event, the metadata includes:
{
"metadata": {
"correlation_id": "2Kj8mNpQ3xR",
"traceparent": "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01",
"client_ip": "192.168.1.100",
"route_id": "api",
...
}
}
Creating Agent Child Spans (Rust Example)
use opentelemetry::{global, trace::{TraceContextExt, Tracer}};
use opentelemetry::propagation::TextMapPropagator;
fn process_request(metadata: &RequestMetadata) -> AgentResponse {
// Extract trace context from traceparent
let mut headers = HashMap::new();
if let Some(tp) = &metadata.traceparent {
headers.insert("traceparent".to_string(), tp.clone());
}
// Create child span
let propagator = opentelemetry_sdk::propagation::TraceContextPropagator::new();
let parent_cx = propagator.extract(&headers);
let tracer = global::tracer("my-agent");
let span = tracer
.span_builder("agent.process")
.with_parent_context(parent_cx)
.start(&tracer);
// Do processing...
span.end();
AgentResponse::default_allow()
}
Sampling Strategies
Development
Trace everything for debugging:
tracing {
backend "otlp" { endpoint "http://jaeger:4317" }
sampling-rate 1.0
service-name "zentinel-dev"
}
Production
Balance visibility with overhead:
tracing {
backend "otlp" { endpoint "http://tempo:4317" }
sampling-rate 0.05 // 5% of requests
service-name "zentinel-prod"
}
Error-Focused
For high-volume services, consider tail-based sampling in your collector to capture all errors while sampling normal requests.
Correlating Logs and Traces
Access Log with Trace ID
observability {
logging {
access-log {
enabled #true
format "json"
include-trace-id #true
}
}
}
Log Output
{
"timestamp": "2024-01-15T10:30:45.123Z",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"method": "POST",
"path": "/api/orders",
"status": 201,
"duration_ms": 145
}
Grafana Log-to-Trace Link
In Grafana, configure Loki to link to Tempo traces:
datasources:
- name: Loki
type: loki
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"trace_id":"([a-f0-9]+)"'
name: TraceID
url: '$${__value.raw}'
Metrics
Monitor tracing health:
# Spans exported per second
rate(otel_exporter_spans_exported_total[5m])
# Export errors
rate(otel_exporter_spans_failed_total[5m])
Next Steps
- Prometheus Example - Metrics setup
- Grafana Example - Dashboard creation
- Observability Config - Full configuration reference