May 19, 2026: Full harness update

- Model migration: gemma-4-E4B → qwen3.5-9b-vlm
- Dashboard reorder: Usage Over Time + GPU Metrics to top
- Router counter leak fix (gpu_decr in except handler)
- VLM slot upgrade 1→2
- Redis stale key cleanup
- Automated maintenance cron job
- LiteLLM config update
- GPU router config update
- README update
This commit is contained in:
Abiba
2026-05-19 15:03:34 +00:00
parent 4f032b035c
commit 9c31b5d622
7 changed files with 43 additions and 46 deletions
+1 -1
View File
@@ -8,7 +8,7 @@ CT 116 Docker stack for routing local GPU models through a unified OpenAI-compat
nginx :80 → router :9000 → GPU backends
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
└─ gemma-4-E4B (Light) @ 192.168.68.110:8080
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
```