d53685d87429193e334aaec11fb867d4947029e8
select_best_gpu() now spreads different agents across GPUs:
- If agent already has a request on a GPU, prefer other GPUs first
- Tracked via Redis agent_gpu:{agent}:{model} with 120s TTL
- Same agent can still use multiple slots on same GPU if needed
- Falls back to normal priority when only one option available
Prevents Tanko+Mumuni from piling onto MoE simultaneously
while Dense sits idle. Each agent naturally spreads across
available GPUs.
Description
SyslogAI Inference Harness — 3-GPU router, dashboard, LiteLLM proxy
Languages
Python
97.6%
Shell
1.9%
Dockerfile
0.5%