fix: non-blocking GPU health checks + 256K turboquant context upgrade
router/router.py: - check_gpu_health() now accepts configurable timeouts (sidecar_timeout, gpu_timeout) - /health and /v1/models endpoints use fast 1.5s/1s timeouts (non-blocking) - /v1/models now calls check_gpu_health once per model instead of twice - GPU_CONTEXT updated to 262144 across all models (turboquant upgrade) - 27B max_concurrent reduced 2→1 (24GB VRAM saturated at 256K context) docker-compose.yml: - Router healthcheck timeout 5s→15s, interval 15s→30s - Nginx healthcheck timeout 5s→15s, interval 15s→30s Fixes dashboard hang when any GPU is unreachable.
This commit is contained in:
+4
-4
@@ -29,8 +29,8 @@ services:
|
||||
- GPU_LIGHT_URL=http://192.168.68.110:8080/v1
|
||||
healthcheck:
|
||||
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9000/health')"]
|
||||
interval: 15s
|
||||
timeout: 5s
|
||||
interval: 30s
|
||||
timeout: 15s
|
||||
retries: 3
|
||||
depends_on:
|
||||
redis:
|
||||
@@ -68,8 +68,8 @@ services:
|
||||
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1/health"]
|
||||
interval: 15s
|
||||
timeout: 5s
|
||||
interval: 30s
|
||||
timeout: 15s
|
||||
retries: 3
|
||||
depends_on:
|
||||
- litellm
|
||||
|
||||
Reference in New Issue
Block a user