Router: new /metrics/scatter endpoint returns individual data points
(prompt_tokens, inference_ms, model, agent, reason, stream)
for scatter visualization.
Dashboard: new panel showing latency vs prompt size by model.
- Log-scale X axis (prompt tokens) with model color coding
- Dropdown to filter by individual model or view all
- Hover tooltips with details per point
- Auto-refresh every 30s
Enables direct observation of context-length vs latency
relationship — validates routing tier decisions.
- Dashboard: when a model has zero non-streaming records, shows
"streaming only" instead of misleading 0 tok/s
- Dashboard: minimum bar width enforced (6% avg, 4% p50) so
low-tps models are always visible
- Router: removed inflated streaming tps estimate (prompt tokens
skewed results for long conversations)
Fixes Dense model appearing to "register nothing" when Mumuni
sends mostly streaming requests.