This website requires JavaScript.
5116e4b1a7
router: heavy tier Dense→MoE→Light + X-Context-Warning headers (compact_soon/compact_recommended/compact_urgent)
main
root
2026-05-22 09:48:00 +00:00
e55bcef21a
router: 4 optimizations — saturated flag fix, heavy tier MoE-first, better token est, session tracking
Abiba
2026-05-21 20:47:48 +00:00
32bd817e97
fix: heavy tier back to Dense→MoE→VLM (Dense now 98K)
Abiba
2026-05-19 21:24:36 +00:00
79965450bb
fix: Dense context 65K→98K, parallel restored to 2
Abiba
2026-05-19 21:20:29 +00:00
6c829abef5
fix: variable collision (r = Redis vs Response) in stream handler
Abiba
2026-05-19 21:15:23 +00:00
6efd5ff51c
feat: context-aware routing + compaction signals
Abiba
2026-05-19 21:13:56 +00:00
350a90b524
fix: sync tier 4 default threshold to 50000 tokens (was stale at 4000)
Abiba
2026-05-19 21:11:34 +00:00
3156c093d5
fix: heavy threshold → 50000 tokens, 25 turns (agent contexts are huge)
Abiba
2026-05-19 21:08:18 +00:00
3cbf38e3e2
fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns
Abiba
2026-05-19 20:09:59 +00:00
b67021ac69
docs: complete design documentation — auth, routing tiers, queue, models, maintenance
Abiba
2026-05-19 19:17:52 +00:00
46dda918de
security: reject requests without valid API key (401 instead of defaulting to starter)
Abiba
2026-05-19 19:13:52 +00:00
7a78c0f98d
fix: heavy tier — Dense first (best for reasoning), then MoE, then VLM
Abiba
2026-05-19 18:20:20 +00:00
15c474aea0
fix: select_best_gpu respects candidate order — first non-busy wins
Abiba
2026-05-19 18:18:00 +00:00
bfc38f5436
fix: routing priority — MoE first, VLM second, Dense last (slow)
Abiba
2026-05-19 17:38:21 +00:00
f519a3fa60
fix: routing — system prompts no longer force heavy tier
Abiba
2026-05-19 17:19:29 +00:00
941e8db65e
feat: redesigned routing tiers — VLM handles more traffic
Abiba
2026-05-19 17:01:55 +00:00
241de4f38c
revert: remove Ollama endpoints (llama.cpp uses OpenAI format, not Ollama)
Abiba
2026-05-19 16:57:04 +00:00
beb2d1790a
fix: add /v1/props and /v1/models/<id> Ollama-compatible endpoints
Abiba
2026-05-19 16:08:24 +00:00
f2f8e8c921
feat: add request queuing to router (replaces hard 503 on saturation)
Abiba
2026-05-19 15:55:05 +00:00
76ade81fda
docs: add Koonimo to agent API keys table
Abiba
2026-05-19 15:48:39 +00:00
9c31b5d622
May 19, 2026: Full harness update
Abiba
2026-05-19 15:03:34 +00:00
4f032b035c
Mumuni review action items: health checks for all containers, version pinning, 503+Retry-After on all-GPU saturation
Abiba (pi)
2026-05-17 09:05:27 +00:00
b09a93f45c
feat: Smart Queue Consumer implementation draft + architecture review
master
SyslogBot
2026-05-17 03:55:20 +00:00
8f3b0c6647
Router: health check verifies actual llama.cpp endpoint, gpu_decr negative guard, AMD sidecar fixed (sysfs fallback)
Abiba (pi)
2026-05-17 01:52:28 +00:00
808c9d3d13
Router: 300s timeout, gpu_decr bugfix. Dashboard: Bootstrap 5 modern redesign with KPI stats, equal-height cards, queue ring. Nginx: 600s timeout.
Abiba (pi)
2026-05-16 22:12:21 +00:00
9817fe2ef2
Dashboard: clean rebuild with Queue Status ring chart, GPU slot indicators, organized layout (GPU/Queue+Model+Agent/Usage/Live)
Abiba (pi)
2026-05-16 21:05:19 +00:00
654cdff718
Dashboard: GPU slot indicators show active/max concurrent requests. Koonimo API key added. Real-time queuing visibility.
Abiba (pi)
2026-05-16 20:43:22 +00:00
bf90e57c5f
Load-aware routing: tracks active GPU requests in Redis, distributes overflow when MoE saturated. 6 concurrent requests now spread across all 3 GPUs instead of queuing on one.
Abiba (pi)
2026-05-16 20:23:32 +00:00
2db2796e53
Dashboard: rename to SyslogAI Harness, GPU bar now shows utilization instead of VRAM
Abiba (pi)
2026-05-16 19:26:46 +00:00
ec0f9fac63
Fix: clean_unicode now uses chr()-based replacements + ASCII strip to prevent bash heredoc corruption. Emoji and all non-ASCII now fully stripped.
Abiba (pi)
2026-05-16 19:12:58 +00:00
3d42ea4767
Merge: add Abiba harness code — nginx, LiteLLM, router, dashboard, Redis
Abiba (pi)
2026-05-16 18:53:31 +00:00
7b6c6aabe1
Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis
Abiba (pi)
2026-05-16 18:51:50 +00:00
e95475f431
Add GPU dashboard container + Nginx routing
SyslogBot
2026-05-15 22:25:56 +00:00
b65ea22765
Update Nginx Docker config
mumuni-bot
2026-05-15 21:35:13 +00:00
cf7f61650f
Add Dockerfile.dashboard
mumuni-bot
2026-05-15 21:34:52 +00:00
7d00bbec0e
Add Dockerfile.queue
mumuni-bot
2026-05-15 21:34:49 +00:00
37f7c95b05
Add env example
mumuni-bot
2026-05-15 21:07:34 +00:00
a28b3a557d
Add Nginx router config
mumuni-bot
2026-05-15 21:07:33 +00:00
c42f3a9979
Add migration plan
mumuni-bot
2026-05-15 21:07:32 +00:00
e1f12c3462
Add dashboard
mumuni-bot
2026-05-15 21:07:07 +00:00
b55b954967
Add queue service
mumuni-bot
2026-05-15 21:07:05 +00:00
c85aaa570b
Add docker-compose
mumuni-bot
2026-05-15 21:07:05 +00:00
43382dac5b
Initial commit: README
mumuni-bot
2026-05-15 21:07:03 +00:00