May 19, 2026: Full harness update
- Model migration: gemma-4-E4B → qwen3.5-9b-vlm - Dashboard reorder: Usage Over Time + GPU Metrics to top - Router counter leak fix (gpu_decr in except handler) - VLM slot upgrade 1→2 - Redis stale key cleanup - Automated maintenance cron job - LiteLLM config update - GPU router config update - README update
This commit is contained in:
@@ -8,7 +8,7 @@ CT 116 Docker stack for routing local GPU models through a unified OpenAI-compat
|
||||
nginx :80 → router :9000 → GPU backends
|
||||
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
|
||||
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
|
||||
└─ gemma-4-E4B (Light) @ 192.168.68.110:8080
|
||||
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
|
||||
|
||||
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user