feat: per-request performance tracking + /metrics/performance endpoint

router/router.py (+158 lines):
- store_perf_record(): captures queue_ms, inference_ms, prompt_tokens,
  completion_tokens, tokens_per_sec per request in Redis
- Per-model, per-reason, per-agent rolling windows (last 200-500)
- /metrics/performance?window=N endpoint with percentiles (p50/p95/p99)
  for latency, throughput, and queue time per model/reason/agent
- Queue time now surfaced in routing metadata and routes:recent
- Streaming requests tracked with estimated prompt tokens

nginx/nginx.conf:
- Added /metrics/ proxy pass to router_api

Enables model performance comparison and routing tier validation.
This commit is contained in:
Abiba
2026-05-25 16:50:45 +00:00
parent b7882b2434
commit b849cd3395
2 changed files with 171 additions and 6 deletions
+7
View File
@@ -78,6 +78,13 @@ http {
proxy_buffering off;
}
# Performance analytics
location /metrics/ {
proxy_pass http://router_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location /health {
proxy_pass http://router_api/health;
proxy_http_version 1.1;