revert: MoE back to 2 slots (cross-agent spread now prevents hotspot)

Cross-agent GPU awareness ensures Tanko+Mumuni never simultaneously hit MoE. Second agent always overflows to Dense/VLM. MoE can safely use its extra VRAM with 2 slots since distinct agents never pile on.
2026-05-30 13:15:19 +00:00
parent 34fb7516e1
commit 060a47fce9
1 changed files with 1 additions and 1 deletions
@@ -19,7 +19,7 @@ GPU_URLS = {
 }
 # Max concurrent requests per GPU (based on llama.cpp --parallel)
 GPU_MAX_CONCURRENT = {
-    "qwen3.6-35B-A3B": 1,   # 1 slot (95C thermal emergency)
+    "qwen3.6-35B-A3B": 2,   # 2 slots (cross-agent spread prevents overheating)
    "qwen3.6-27B-code": 2,  # 2 slots (128K context frees VRAM)
    "qwen3.5-9b-vlm": 2,       # 2 slots (12GB VRAM, 4GB headroom)
 }