Files
inference-harness/router
Abiba ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)
Strix Halo running qwen3.6-35B-A3B was hitting 94°C with 2 concurrent
slots, causing 300s request timeouts. Mumuni + Koby accumulated 15
timeouts in the last hour. Reduced to 1 slot for thermal headroom.

Medium and Default tiers already route VLM before MoE as fallback,
minimizing overflow traffic to the hot GPU.
2026-05-26 23:47:08 +00:00
..
2026-05-19 15:03:47 +00:00
2026-05-19 15:03:47 +00:00