ebe8f9ced431e0b149816a5fcb2e341f4645ab26
Strix Halo running qwen3.6-35B-A3B was hitting 94°C with 2 concurrent slots, causing 300s request timeouts. Mumuni + Koby accumulated 15 timeouts in the last hour. Reduced to 1 slot for thermal headroom. Medium and Default tiers already route VLM before MoE as fallback, minimizing overflow traffic to the hot GPU.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%