Abiba 9a0d69ce8d feat: Dense 128K context + 2 slots, VLM second in Heavy tier
- Dense GPU_CONTEXT: 192K→128K (131072) to free VRAM
- Dense max_concurrent: 1→2 (VRAM now sufficient)
- Heavy tier: Dense → VLM → MoE (VLM handles 262K context)
- Total slots: 6 (2 Dense + 2 MoE + 2 VLM)

Distribution target: Dense 50%, VLM 30%, MoE 20%

NOTE: Requires llama.cpp restart on 192.168.68.8 with --ctx-size 131072
2026-05-27 07:15:58 +00:00
2026-05-19 15:03:47 +00:00
S
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
371 KiB
Languages
Python 97.6%
Shell 1.9%
Dockerfile 0.5%