41939104c73840a8d9e524f7951b68d84900f771
router/router.py: - check_gpu_health() now accepts configurable timeouts (sidecar_timeout, gpu_timeout) - /health and /v1/models endpoints use fast 1.5s/1s timeouts (non-blocking) - /v1/models now calls check_gpu_health once per model instead of twice - GPU_CONTEXT updated to 262144 across all models (turboquant upgrade) - 27B max_concurrent reduced 2→1 (24GB VRAM saturated at 256K context) docker-compose.yml: - Router healthcheck timeout 5s→15s, interval 15s→30s - Nginx healthcheck timeout 5s→15s, interval 15s→30s Fixes dashboard hang when any GPU is unreachable.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%