Files
inference-harness/router
Abiba ddde6646de fix: decouple VRAM usage from saturation status
VRAM percentage no longer marks GPU as saturated.
Saturation is about slot availability (handled by is_gpu_busy()),
not memory usage. Added vram_warning boolean flag (≥95% threshold)
for informational monitoring without affecting routing decisions.

27B Dense now correctly shows healthy at 91% VRAM.
2026-05-23 06:00:37 +00:00
..
2026-05-19 15:03:47 +00:00
2026-05-19 15:03:47 +00:00