Files
inference-harness/router
Abiba b3db0841ef feat: redesigned routing tiers for even GPU distribution + speed priority
OLD: Dense was last choice in every tier, got 4% of auto-routed traffic
NEW: 5-tier routing with speed-first prioritization

Tier 1 (Lightweight): VLM → Dense → MoE    (≤500 tok, ≤100 words)
Tier 2 (Simple):      VLM → Dense → MoE    (≤4000 tok, ≤6 turns)
Tier 3 (Medium):      DENSE → MoE → VLM    (≤25000 tok, ≤15 turns)
Tier 4 (Heavy):       MoE → Dense → VLM    (>25000 tok or >15 turns)
Tier 5 (Default):     DENSE → MoE → VLM    (balanced fallback)

Also: quality hint now routes to MoE (better reasoning)
Bugfix: Tier 1 now checks token count to prevent giant single-word
inputs from being routed as lightweight
2026-05-26 22:00:20 +00:00
..
2026-05-19 15:03:47 +00:00
2026-05-19 15:03:47 +00:00