6 Commits

Author SHA1 Message Date
Abiba 80362fa528 fix: default performance window to 24h so all models appear immediately 2026-05-26 12:37:52 +00:00
Abiba 7ef9e58f61 fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries) 2026-05-26 12:31:53 +00:00
Abiba f47c3f3304 feat: latency vs prompt size scatter plot on dashboard
Router: new /metrics/scatter endpoint returns individual data points
(prompt_tokens, inference_ms, model, agent, reason, stream)
for scatter visualization.

Dashboard: new panel showing latency vs prompt size by model.
- Log-scale X axis (prompt tokens) with model color coding
- Dropdown to filter by individual model or view all
- Hover tooltips with details per point
- Auto-refresh every 30s

Enables direct observation of context-length vs latency
relationship — validates routing tier decisions.
2026-05-26 12:18:31 +00:00
Abiba b2ec4b0572 fix: throughput panel handles streaming-only models gracefully
- Dashboard: when a model has zero non-streaming records, shows
  "streaming only" instead of misleading 0 tok/s
- Dashboard: minimum bar width enforced (6% avg, 4% p50) so
  low-tps models are always visible
- Router: removed inflated streaming tps estimate (prompt tokens
  skewed results for long conversations)

Fixes Dense model appearing to "register nothing" when Mumuni
sends mostly streaming requests.
2026-05-25 19:45:21 +00:00
Abiba f42747d721 feat: performance analytics panel on dashboard
dashboard/dashboard.py (+61 lines):
- New /api/performance endpoint proxying to router metrics/performance
- Performance Analytics row with 4 panels:
  - Latency distribution (p50/p95/p99 per model) with stacked bars
  - Throughput comparison (avg + p50 tokens/sec per model)
  - Routing effectiveness table by reason
  - Agent performance bars with latency
- 1h/24h window toggle, auto-refresh every 15s
- Color-coded per model (purple=MoE, amber=Dense, green=VLM)
2026-05-25 16:58:15 +00:00
Abiba 28fc57c5c7 May 19, 2026: Full harness update
- Model migration: gemma-4-E4B → qwen3.5-9b-vlm
- Dashboard reorder: Usage Over Time + GPU Metrics to top
- Router counter leak fix (gpu_decr in except handler)
- VLM slot upgrade 1→2
- Automated maintenance cron job
- LiteLLM config update
2026-05-19 15:03:47 +00:00