feat: per-request performance tracking + /metrics/performance endpoint
router/router.py (+158 lines): - store_perf_record(): captures queue_ms, inference_ms, prompt_tokens, completion_tokens, tokens_per_sec per request in Redis - Per-model, per-reason, per-agent rolling windows (last 200-500) - /metrics/performance?window=N endpoint with percentiles (p50/p95/p99) for latency, throughput, and queue time per model/reason/agent - Queue time now surfaced in routing metadata and routes:recent - Streaming requests tracked with estimated prompt tokens nginx/nginx.conf: - Added /metrics/ proxy pass to router_api Enables model performance comparison and routing tier validation.
This commit is contained in:
@@ -78,6 +78,13 @@ http {
|
||||
proxy_buffering off;
|
||||
}
|
||||
|
||||
# Performance analytics
|
||||
location /metrics/ {
|
||||
proxy_pass http://router_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
|
||||
location /health {
|
||||
proxy_pass http://router_api/health;
|
||||
proxy_http_version 1.1;
|
||||
|
||||
Reference in New Issue
Block a user