inference-harness

Files

T

Abiba ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)

Strix Halo running qwen3.6-35B-A3B was hitting 94°C with 2 concurrent
slots, causing 300s request timeouts. Mumuni + Koby accumulated 15
timeouts in the last hour. Reduced to 1 slot for thermal headroom.

Medium and Default tiers already route VLM before MoE as fallback,
minimizing overflow traffic to the hot GPU.

2026-05-26 23:47:08 +00:00

Dockerfile

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

http_patch.py

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

requirements.txt

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

router.py

fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)

2026-05-26 23:47:08 +00:00

router.py.bak.20260518074236

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

ts_patch.py

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00