9a0d69ce8d
- Dense GPU_CONTEXT: 192K→128K (131072) to free VRAM - Dense max_concurrent: 1→2 (VRAM now sufficient) - Heavy tier: Dense → VLM → MoE (VLM handles 262K context) - Total slots: 6 (2 Dense + 2 MoE + 2 VLM) Distribution target: Dense 50%, VLM 30%, MoE 20% NOTE: Requires llama.cpp restart on 192.168.68.8 with --ctx-size 131072