fb1d51b93b82a8c7ee7a855cebd13cd8b024eb42
Tier 1 (Lightweight): VLM → Dense → MoE ≤500 tok, 1 turn
Tier 2 (Simple): VLM → Dense → MoE ≤15K tok, ≤12 turns (was 10K/10)
Tier 3 (Medium): Dense → VLM → MoE ≤25K tok
Tier 4 (Heavy): MoE → Dense → VLM >25K tok (MoE PRIMARY workhorse)
Tier 5 (Default): MoE → Dense → VLM MoE primary fallback
Target: MoE ~50% (heavy primary), VLM ~25% (raised simple + fallback),
Dense ~25% (medium primary + heavy fallback)
Removed turn limit from Medium tier — Simple tier handles conversational
requests up to 12 turns now.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%