inference-harness

T

Abiba 34fb7516e1 fix: cross-agent GPU spreading prevents hotspot hammering

OLD: checked only if CURRENT agent was on a GPU
  Tanko→MoE, Mumuni also→MoE (didnt see Tanko)

NEW: checks if ANY agent is on a GPU (cross-agent awareness)
  Pass 1: prefer GPUs with 0 agents
  Pass 2: prefer GPU this agent is not already on
  Pass 3: any non-busy GPU

Prevents Tanko+Mumuni piling onto same GPU simultaneously
even when both slots are free. Combined with MoE=1 slot,
guarantees overflow goes to idle Dense.

2026-05-30 12:55:29 +00:00

dashboard

fix: default performance window to 24h so all models appear immediately

2026-05-26 12:37:52 +00:00

nginx

feat: per-request performance tracking + /metrics/performance endpoint

2026-05-25 16:50:45 +00:00

router

fix: cross-agent GPU spreading prevents hotspot hammering

2026-05-30 12:55:29 +00:00

.gitignore

May 19, 2026: Full harness update