inference-harness

T

Abiba 621a897bec tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM

More conversations now route to VLM as primary. 9B VLM has 262K
context window and 88 tok/s average — well suited for moderate
conversations. Dense absorbs overflow and heavy reasoning.

2026-05-27 00:29:25 +00:00

dashboard

fix: default performance window to 24h so all models appear immediately

2026-05-26 12:37:52 +00:00

nginx

feat: per-request performance tracking + /metrics/performance endpoint

2026-05-25 16:50:45 +00:00

router

tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM

2026-05-27 00:29:25 +00:00

.gitignore

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

docker-compose.yml

fix: non-blocking GPU health checks + 256K turboquant context upgrade

2026-05-23 05:57:13 +00:00

docker-compose.yml.bak

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

litellm_config.yaml

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00

maintenance.sh

May 19, 2026: Full harness update

2026-05-19 15:03:47 +00:00