inference-harness

T

Abiba 060a47fce9 revert: MoE back to 2 slots (cross-agent spread now prevents hotspot)

Cross-agent GPU awareness ensures Tanko+Mumuni never
simultaneously hit MoE. Second agent always overflows
to Dense/VLM. MoE can safely use its extra VRAM with
2 slots since distinct agents never pile on.

2026-05-30 13:15:19 +00:00

dashboard

fix: default performance window to 24h so all models appear immediately

2026-05-26 12:37:52 +00:00

nginx

feat: per-request performance tracking + /metrics/performance endpoint

2026-05-25 16:50:45 +00:00

router

revert: MoE back to 2 slots (cross-agent spread now prevents hotspot)

2026-05-30 13:15:19 +00:00

.gitignore

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

docker-compose.yml

fix: non-blocking GPU health checks + 256K turboquant context upgrade

2026-05-23 05:57:13 +00:00

docker-compose.yml.bak

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

litellm_config.yaml

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

maintenance.sh

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00