May 19, 2026: Full harness update

- Model migration: gemma-4-E4B → qwen3.5-9b-vlm - Dashboard reorder: Usage Over Time + GPU Metrics to top - Router counter leak fix (gpu_decr in except handler) - VLM slot upgrade 1→2 - Redis stale key cleanup - Automated maintenance cron job - LiteLLM config update - GPU router config update - README update
2026-05-19 15:03:34 +00:00
parent 4f032b035c
commit 9c31b5d622
7 changed files with 43 additions and 46 deletions
@@ -8,7 +8,7 @@ CT 116 Docker stack for routing local GPU models through a unified OpenAI-compat
 nginx :80 → router :9000 → GPU backends
                ├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
                ├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
-                └─ gemma-4-E4B (Light) @ 192.168.68.110:8080
+                └─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080

 LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
 ```