inference-harness

T

Abiba ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)

Strix Halo running qwen3.6-35B-A3B was hitting 94°C with 2 concurrent
slots, causing 300s request timeouts. Mumuni + Koby accumulated 15
timeouts in the last hour. Reduced to 1 slot for thermal headroom.

Medium and Default tiers already route VLM before MoE as fallback,
minimizing overflow traffic to the hot GPU.

2026-05-26 23:47:08 +00:00

dashboard

fix: default performance window to 24h so all models appear immediately

2026-05-26 12:37:52 +00:00

nginx

feat: per-request performance tracking + /metrics/performance endpoint

2026-05-25 16:50:45 +00:00

router

fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)

2026-05-26 23:47:08 +00:00

.gitignore

May 19, 2026: Full harness update