acbcb208372ccee8cf9d84b23de71633a2ed413d
MoE at 95C with p50=13s latency — thermal throttling causing death spiral. Both slots stuck processing for 113s p95. Dense idle at 38C with 2 free slots. Reducing MoE to 1 slot forces heavy overflow to Dense, giving MoE thermal headroom. Heavy tier: MoE → Dense → VLM still valid — first heavy goes to MoE, second overflows to Dense.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%