c4ea5e3a9858b2109149b871c0374b29cbf9b001
Dense → MoE → VLM instead of MoE → Dense → VLM. Combined with MoE at 1 concurrent slot, Dense absorbs all primary traffic. MoE only activates when Dense saturated. Prevents Strix Halo from hitting 94C thermal limit.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%