Files
inference-harness/router
Abiba acbcb20837 fix: MoE concurrency 2→1 (95C thermal emergency)
MoE at 95C with p50=13s latency — thermal throttling causing
death spiral. Both slots stuck processing for 113s p95.
Dense idle at 38C with 2 free slots. Reducing MoE to 1 slot
forces heavy overflow to Dense, giving MoE thermal headroom.

Heavy tier: MoE → Dense → VLM still valid — first heavy goes
to MoE, second overflows to Dense.
2026-05-30 12:52:23 +00:00
..
2026-05-19 15:03:47 +00:00
2026-05-19 15:03:47 +00:00