Zentinel composes agents into per-route pipelines that inspect and modify traffic as it flows through the proxy. This page explains the pipeline model, chaining semantics, execution strategies, and performance trade-offs.
Pipeline Model
An agent pipeline is an ordered sequence of filters attached to a route. Each filter is an independent processing unit — it may be a built-in filter (rate limiting, CORS, compression) or an agent filter that delegates to an external process. The route’s filters block defines both the composition and the execution order.
Incoming Request
│
▼
┌────────────────┐
│ Route │
│ "api-users" │
└───────┬────────┘
│
─────────────────┼───── Request Phase (top → bottom) ──────
│
┌──────────▼──────────┐
│ rate-limit filter │ Built-in
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ auth agent filter │ → External agent (UDS/gRPC)
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ WAF agent filter │ → External agent (UDS/gRPC)
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ compression filter │ Built-in
└──────────┬──────────┘
│
─────────────────┼───── Forward to upstream ───────────────
│
▼
┌────────────┐
│ Upstream │
└──────┬─────┘
│
─────────────────┼───── Response Phase (bottom → top) ─────
│
┌──────────▼──────────┐
│ compression filter │ Built-in
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ WAF agent filter │ → External agent
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ auth agent filter │ → External agent
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ rate-limit filter │ Built-in
└──────────┬──────────┘
│
▼
Client Response
Key properties:
- Request phase executes filters in declaration order (top → bottom).
- Response phase executes filters in reverse declaration order (bottom → top).
- Each agent filter communicates with its external agent process over UDS or gRPC.
- A filter can short-circuit the pipeline at any point (e.g., a block decision stops further processing).
Chaining Semantics
When multiple agents participate in a pipeline, their decisions and mutations are aggregated according to deterministic rules.
Decision Aggregation
Agents return one of three decisions: allow, block, or redirect. The pipeline uses first-block-wins semantics:
┌──────────────────────────────────────────────────────────────┐
│ Decision Aggregation │
├──────────┬───────────┬───────────┬───────────────────────────┤
│ Agent 1 │ Agent 2 │ Agent 3 │ Pipeline Result │
├──────────┼───────────┼───────────┼───────────────────────────┤
│ allow │ allow │ allow │ allow │
│ allow │ block │ (skipped) │ block (from Agent 2) │
│ allow │ redirect │ (skipped) │ redirect (from Agent 2) │
│ block │ (skipped) │ (skipped) │ block (from Agent 1) │
└──────────┴───────────┴───────────┴───────────────────────────┘
The first non-allow decision terminates the pipeline. Remaining agents are not called.
Mutation Accumulation
When agents mutate the request or response, mutations accumulate as the pipeline progresses:
| Mutation Type | Accumulation Rule |
|---|---|
| Header set | Merged across agents; last writer wins for the same header name |
| Header remove | Union of all removals |
| Body replacement | Last writer wins (only the final body mutation applies) |
| Audit metadata | Deep-merged across all agents |
| Response header set | Merged; last writer wins per header name |
Agent 1 sets: X-User-Id: "user-123"
Agent 2 sets: X-Threat-Score: "low", X-User-Id: "enriched-123"
Agent 3 sets: X-Audit-Trail: "logged"
Final headers: X-User-Id: "enriched-123" ← Agent 2 overwrote Agent 1
X-Threat-Score: "low" ← Agent 2
X-Audit-Trail: "logged" ← Agent 3
Per-Phase Independence
The pipeline runs independently for each event phase. An agent subscribes to the phases it cares about, and unsubscribed phases skip that agent entirely.
| Phase | Description | Typical Subscribers |
|---|---|---|
request_headers | Incoming request headers and metadata | Auth, WAF, rate limiting |
request_body | Request body chunks | WAF, content scanning, transformation |
response_headers | Upstream response headers | Security headers, audit logging |
response_body | Response body chunks | Content scanning, transformation, PII detection |
An agent that only subscribes to request_headers is never called during body or response phases, reducing pipeline cost for that request.
Execution Strategies
Zentinel uses different execution strategies depending on the event phase to balance latency against correctness.
Parallel Execution (Request Headers)
For the request_headers phase, all agent filters in the pipeline execute in parallel. Each agent receives the original, unmodified request headers and returns its decision independently.
Request Headers Arrive
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Auth │ │ WAF │ │ Rate │
│ Agent │ │ Agent │ │ Limit │
│ (8ms) │ │ (12ms) │ │ (3ms) │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────────┼───────────┘
│
▼
Aggregate Results
Total: 12ms (not 23ms)
This makes pipeline latency O(L) where L is the latency of the slowest agent, not O(N×L) which would result from sequential execution.
Sequential Execution (Body and Response Phases)
For request_body, response_headers, and response_body phases, agents execute sequentially in pipeline order. This is necessary because:
- Body mutations from one agent must be visible to the next.
- Response header mutations accumulate in order.
- Flow control (pause/resume) requires sequential coordination.
Request Body Chunk Arrives
│
▼
┌───────────┐
│ WAF │ Inspect body, may block
│ Agent │
└─────┬─────┘
│
▼
┌───────────┐
│ Transform │ Modify body content
│ Agent │
└─────┬─────┘
│
▼
┌───────────┐
│ Audit │ Log body hash
│ Agent │
└─────┬─────┘
│
▼
Forward to upstream
Strategy Summary
| Event Phase | Strategy | Rationale |
|---|---|---|
request_headers | Parallel | Agents inspect independently; no mutation dependencies |
request_body | Sequential | Body mutations must chain; flow control |
response_headers | Sequential | Header mutations accumulate in order |
response_body | Sequential | Body mutations must chain; flow control |
Per-Agent Isolation
Each agent in the pipeline operates with its own isolation boundaries, preventing one agent’s failure from cascading through the system.
Semaphore-Based Queue Isolation
Every agent filter has a configurable concurrency semaphore that limits how many in-flight requests it processes simultaneously. When the semaphore is full, new requests queue (up to a configurable depth) or trigger the filter’s failure mode.
┌─────────────────────────────────────────────────────────────┐
│ Agent: WAF │
│ │
│ Semaphore: 3/3 in-flight │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Req #1 │ │ Req #2 │ │ Req #3 │ ← Processing │
│ └────────┘ └────────┘ └────────┘ │
│ │
│ Queue: 2 waiting (max 10) │
│ ┌────────┐ ┌────────┐ │
│ │ Req #4 │ │ Req #5 │ ← Queued │
│ └────────┘ └────────┘ │
│ │
│ Req #6 arrives → queued (position 3) │
│ Req #14 arrives → queue full → fail-mode triggered │
└─────────────────────────────────────────────────────────────┘
Circuit Breakers
Each agent connection tracks health using lock-free atomics and implements the circuit breaker pattern:
┌──────────┐ error rate > threshold ┌──────────┐
│ Closed │ ────────────────────────────▶│ Open │
│ (normal) │ │ (reject) │
└──────────┘ └────┬─────┘
▲ │
│ cooldown expires
│ success │
│ ┌────────────┐ │
└────│ Half-Open │◀──────────────────────┘
│ (probe) │
└────────────┘
- Closed — Normal operation. Errors are counted.
- Open — Agent is considered unhealthy. Requests skip it and apply the filter’s failure mode.
- Half-Open — A single probe request is sent. Success returns to Closed; failure returns to Open.
Health state is checked via atomic loads (~10ns), adding negligible overhead to the hot path.
Per-Filter Failure Modes
Each agent filter configures its own failure behavior independently:
| Failure Mode | Behavior | Use Case |
|---|---|---|
fail-open | Allow request to continue | Non-critical agents (analytics, logging) |
fail-closed | Block request with 503 | Critical security agents (auth, WAF) |
filters {
filter "auth" {
agent "auth-agent"
fail-mode "fail-closed" // Auth failure = block
timeout-ms 5000
}
filter "analytics" {
agent "analytics-agent"
fail-mode "fail-open" // Analytics failure = continue
timeout-ms 2000
}
}
Graceful Degradation
When an agent enters the Open circuit breaker state, the pipeline continues with the remaining healthy agents. This provides defense in depth — losing one layer does not disable the entire pipeline.
Pipeline with 3 agents:
Normal: [Auth ✓] → [WAF ✓] → [Rate Limit ✓] → upstream
Degraded: [Auth ✓] → [WAF ✗ fail-open] → [Rate Limit ✓] → upstream
Critical: [Auth ✓] → [WAF ✗ fail-open] → [Rate Limit ✗ fail-closed] → 503
Performance Characteristics
Agent pipelines add latency proportional to the number of agents and the transport used. Understanding these costs helps you design pipelines that meet your latency budget.
IPC Cost Per Agent
| Transport | Typical Latency | Best For |
|---|---|---|
| UDS (Unix Domain Socket) | ~50–200µs | Same-host agents, lowest latency |
| gRPC | ~200–500µs | Cross-host agents, language flexibility |
These numbers represent the round-trip IPC overhead, excluding agent processing time. See Performance for detailed benchmarks including serialization costs and throughput numbers.
Pipeline Depth vs Latency
For the request_headers phase (parallel execution):
Pipeline Agents Parallel Latency Notes
──────── ────── ──────────────── ─────
Minimal 1 agent ~100µs Single agent overhead
Standard 3 agents ~200µs Bounded by slowest agent
Deep 5 agents ~300µs Marginal cost per agent is low
For sequential phases (body/response), latency scales linearly:
Pipeline Agents Sequential Latency Notes
──────── ────── ────────────────── ─────
Minimal 1 agent ~100µs Single agent overhead
Standard 3 agents ~400µs Sum of all agent latencies
Deep 5 agents ~700µs Each agent adds its full cost
The Out-of-Process Trade-off
Zentinel’s agent model runs security logic in separate processes. This adds IPC cost but provides significant benefits:
| In-Process (e.g., Wasm, Lua) | Out-of-Process (Agents) | |
|---|---|---|
| Latency | ~1–10µs | ~50–500µs |
| Isolation | Crash can affect proxy | Crash is contained |
| Deployment | Requires proxy restart | Independent updates |
| Language | Limited (Wasm, Lua) | Any language |
| Resource limits | Shared with proxy | Separate memory/CPU |
| Debugging | Harder (embedded) | Standard tooling |
| Scaling | Scales with proxy | Scales independently |
For most security workloads, the 50–500µs overhead is negligible compared to the network latency of the upstream request (typically 5–50ms). The isolation and operational benefits outweigh the cost.
Note: Zentinel also supports WASM agents for cases where in-process latency is critical. WASM agents run inside the proxy process with Wasmtime sandboxing, offering a middle ground between pure in-process and out-of-process execution.
See Comparison for how Zentinel’s agent overhead compares to Envoy ext_proc, HAProxy SPOE, and NGINX njs.
Pipeline Patterns
These patterns illustrate common pipeline compositions for real-world use cases.
Security Gateway
A standard security gateway that authenticates, inspects, and rate-limits traffic:
route "api" {
matches { path-prefix "/api/" }
upstream "backend"
filters {
filter "rate-limit" {
type "rate-limit"
requests-per-second 100
burst 20
}
filter "auth" {
agent "auth-agent"
fail-mode "fail-closed"
timeout-ms 5000
}
filter "waf" {
agent "waf-agent"
fail-mode "fail-closed"
timeout-ms 3000
}
}
}
Pipeline behavior: Rate limiting runs first (cheapest check). Auth validates credentials. WAF inspects request content. All three run in parallel during request_headers. If auth or WAF blocks, the request never reaches the upstream.
API Gateway
An API gateway that authenticates, transforms requests, and logs for audit:
route "partner-api" {
matches {
path-prefix "/partner/v2/"
header "X-Partner-Key"
}
upstream "partner-service"
filters {
filter "auth" {
agent "auth-agent"
fail-mode "fail-closed"
timeout-ms 5000
}
filter "transform" {
agent "transform-agent"
fail-mode "fail-closed"
timeout-ms 2000
}
filter "audit" {
agent "audit-logger-agent"
fail-mode "fail-open"
timeout-ms 1000
}
}
}
Pipeline behavior: Auth validates the partner key. Transform rewrites headers or body for the backend. Audit logs the request metadata. The audit agent is fail-open — a logging failure should never block a partner request.
Observability Pipeline
A lightweight pipeline focused on traffic visibility:
route "all-traffic" {
matches { path-prefix "/" }
upstream "backend"
filters {
filter "access-log" {
agent "audit-logger-agent"
fail-mode "fail-open"
timeout-ms 1000
}
filter "analytics" {
agent "analytics-agent"
fail-mode "fail-open"
timeout-ms 500
}
}
}
Pipeline behavior: Both agents are fail-open. The pipeline never blocks traffic — it only observes. If either agent is slow or down, requests continue unaffected.
Defense in Depth
A multi-layered security pipeline for high-value endpoints:
route "admin" {
matches {
path-prefix "/admin/"
method "GET" "POST" "PUT" "DELETE"
}
upstream "admin-service"
filters {
filter "rate-limit" {
type "rate-limit"
requests-per-second 10
burst 5
}
filter "ip-reputation" {
agent "ip-reputation-agent"
fail-mode "fail-closed"
timeout-ms 3000
}
filter "auth" {
agent "auth-agent"
fail-mode "fail-closed"
timeout-ms 5000
}
filter "waf" {
agent "waf-agent"
fail-mode "fail-closed"
timeout-ms 3000
}
filter "content-scanner" {
agent "content-scanner-agent"
fail-mode "fail-closed"
timeout-ms 5000
}
}
}
Pipeline behavior: Five layers of defense, all fail-closed. Rate limiting and IP reputation filter obvious abuse cheaply. Auth validates identity. WAF inspects headers. Content scanner inspects request bodies for malicious payloads. During request_headers, the agent filters run in parallel — the total overhead is bounded by the slowest agent (~5ms), not the sum of all agents.
Configuration Reference
A complete configuration example showing a pipeline with listeners, routes, agents, upstreams, and filters:
system {
worker-threads 4
}
listeners {
listener "https" {
address "0.0.0.0:443"
tls {
cert-path "/etc/zentinel/certs/api.crt"
key-path "/etc/zentinel/certs/api.key"
}
}
}
agents {
agent "auth-agent" {
socket "/var/run/zentinel/auth.sock"
pool-size 4
events "request_headers"
}
agent "waf-agent" {
socket "/var/run/zentinel/waf.sock"
pool-size 8
events "request_headers" "request_body" "response_body"
}
agent "audit-agent" {
socket "/var/run/zentinel/audit.sock"
pool-size 2
events "request_headers" "response_headers"
}
}
routes {
route "api" {
priority 100
matches {
path-prefix "/api/"
method "GET" "POST" "PUT" "DELETE"
}
upstream "api-backend"
filters {
filter "rate-limit" {
type "rate-limit"
requests-per-second 100
burst 20
}
filter "auth" {
agent "auth-agent"
fail-mode "fail-closed"
timeout-ms 5000
max-concurrent 100
}
filter "waf" {
agent "waf-agent"
fail-mode "fail-closed"
timeout-ms 3000
max-concurrent 50
}
filter "headers" {
type "headers"
response {
set "X-Content-Type-Options" "nosniff"
set "X-Frame-Options" "DENY"
}
}
filter "audit" {
agent "audit-agent"
fail-mode "fail-open"
timeout-ms 1000
max-concurrent 200
}
}
}
}
upstreams {
upstream "api-backend" {
targets {
target { address "10.0.1.1:8080" weight 5 }
target { address "10.0.1.2:8080" weight 5 }
}
load-balancing "round_robin"
health-check {
path "/health"
interval-secs 10
}
}
}
This configuration creates a pipeline where:
- Rate limiting rejects excessive traffic (built-in, no IPC cost).
- Auth agent validates credentials via UDS (fail-closed).
- WAF agent inspects headers and body via UDS (fail-closed).
- Headers filter adds security response headers (built-in).
- Audit agent logs request/response metadata via UDS (fail-open).
During the request_headers phase, agents 2, 3, and 5 execute in parallel. During request_body, only the WAF agent runs (it’s the only one subscribed). During response_headers, the audit agent captures the response metadata.
Next Steps
- Request Lifecycle — How requests traverse the full proxy lifecycle
- Routing System — Route matching and priority rules
- Filters Configuration — Complete filter type reference
- Connection Pooling — Agent connection management and load balancing
- Performance — Detailed benchmarks and optimization profiles
- Comparison — How Zentinel’s agent model compares to alternatives