inference-harness

Author	SHA1	Message	Date
Abiba	80362fa528	fix: default performance window to 24h so all models appear immediately	2026-05-26 12:37:52 +00:00
Abiba	7ef9e58f61	fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries)	2026-05-26 12:31:53 +00:00
Abiba	f47c3f3304	feat: latency vs prompt size scatter plot on dashboard Router: new /metrics/scatter endpoint returns individual data points (prompt_tokens, inference_ms, model, agent, reason, stream) for scatter visualization. Dashboard: new panel showing latency vs prompt size by model. - Log-scale X axis (prompt tokens) with model color coding - Dropdown to filter by individual model or view all - Hover tooltips with details per point - Auto-refresh every 30s Enables direct observation of context-length vs latency relationship — validates routing tier decisions.	2026-05-26 12:18:31 +00:00
Abiba	b2ec4b0572	fix: throughput panel handles streaming-only models gracefully - Dashboard: when a model has zero non-streaming records, shows "streaming only" instead of misleading 0 tok/s - Dashboard: minimum bar width enforced (6% avg, 4% p50) so low-tps models are always visible - Router: removed inflated streaming tps estimate (prompt tokens skewed results for long conversations) Fixes Dense model appearing to "register nothing" when Mumuni sends mostly streaming requests.	2026-05-25 19:45:21 +00:00
Abiba	f42747d721	feat: performance analytics panel on dashboard dashboard/dashboard.py (+61 lines): - New /api/performance endpoint proxying to router metrics/performance - Performance Analytics row with 4 panels: - Latency distribution (p50/p95/p99 per model) with stacked bars - Throughput comparison (avg + p50 tokens/sec per model) - Routing effectiveness table by reason - Agent performance bars with latency - 1h/24h window toggle, auto-refresh every 15s - Color-coded per model (purple=MoE, amber=Dense, green=VLM)	2026-05-25 16:58:15 +00:00
Abiba	28fc57c5c7	May 19, 2026: Full harness update - Model migration: gemma-4-E4B → qwen3.5-9b-vlm - Dashboard reorder: Usage Over Time + GPU Metrics to top - Router counter leak fix (gpu_decr in except handler) - VLM slot upgrade 1→2 - Automated maintenance cron job - LiteLLM config update	2026-05-19 15:03:47 +00:00

6 Commits