Commit Graph

  • 5116e4b1a7 router: heavy tier Dense→MoE→Light + X-Context-Warning headers (compact_soon/compact_recommended/compact_urgent) main root 2026-05-22 09:48:00 +00:00
  • e55bcef21a router: 4 optimizations — saturated flag fix, heavy tier MoE-first, better token est, session tracking Abiba 2026-05-21 20:47:48 +00:00
  • 32bd817e97 fix: heavy tier back to Dense→MoE→VLM (Dense now 98K) Abiba 2026-05-19 21:24:36 +00:00
  • 79965450bb fix: Dense context 65K→98K, parallel restored to 2 Abiba 2026-05-19 21:20:29 +00:00
  • 6c829abef5 fix: variable collision (r = Redis vs Response) in stream handler Abiba 2026-05-19 21:15:23 +00:00
  • 6efd5ff51c feat: context-aware routing + compaction signals Abiba 2026-05-19 21:13:56 +00:00
  • 350a90b524 fix: sync tier 4 default threshold to 50000 tokens (was stale at 4000) Abiba 2026-05-19 21:11:34 +00:00
  • 3156c093d5 fix: heavy threshold → 50000 tokens, 25 turns (agent contexts are huge) Abiba 2026-05-19 21:08:18 +00:00
  • 3cbf38e3e2 fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns Abiba 2026-05-19 20:09:59 +00:00
  • b67021ac69 docs: complete design documentation — auth, routing tiers, queue, models, maintenance Abiba 2026-05-19 19:17:52 +00:00
  • 46dda918de security: reject requests without valid API key (401 instead of defaulting to starter) Abiba 2026-05-19 19:13:52 +00:00
  • 7a78c0f98d fix: heavy tier — Dense first (best for reasoning), then MoE, then VLM Abiba 2026-05-19 18:20:20 +00:00
  • 15c474aea0 fix: select_best_gpu respects candidate order — first non-busy wins Abiba 2026-05-19 18:18:00 +00:00
  • bfc38f5436 fix: routing priority — MoE first, VLM second, Dense last (slow) Abiba 2026-05-19 17:38:21 +00:00
  • f519a3fa60 fix: routing — system prompts no longer force heavy tier Abiba 2026-05-19 17:19:29 +00:00
  • 941e8db65e feat: redesigned routing tiers — VLM handles more traffic Abiba 2026-05-19 17:01:55 +00:00
  • 241de4f38c revert: remove Ollama endpoints (llama.cpp uses OpenAI format, not Ollama) Abiba 2026-05-19 16:57:04 +00:00
  • beb2d1790a fix: add /v1/props and /v1/models/<id> Ollama-compatible endpoints Abiba 2026-05-19 16:08:24 +00:00
  • f2f8e8c921 feat: add request queuing to router (replaces hard 503 on saturation) Abiba 2026-05-19 15:55:05 +00:00
  • 76ade81fda docs: add Koonimo to agent API keys table Abiba 2026-05-19 15:48:39 +00:00
  • 9c31b5d622 May 19, 2026: Full harness update Abiba 2026-05-19 15:03:34 +00:00
  • 4f032b035c Mumuni review action items: health checks for all containers, version pinning, 503+Retry-After on all-GPU saturation Abiba (pi) 2026-05-17 09:05:27 +00:00
  • b09a93f45c feat: Smart Queue Consumer implementation draft + architecture review master SyslogBot 2026-05-17 03:55:20 +00:00
  • 8f3b0c6647 Router: health check verifies actual llama.cpp endpoint, gpu_decr negative guard, AMD sidecar fixed (sysfs fallback) Abiba (pi) 2026-05-17 01:52:28 +00:00
  • 808c9d3d13 Router: 300s timeout, gpu_decr bugfix. Dashboard: Bootstrap 5 modern redesign with KPI stats, equal-height cards, queue ring. Nginx: 600s timeout. Abiba (pi) 2026-05-16 22:12:21 +00:00
  • 9817fe2ef2 Dashboard: clean rebuild with Queue Status ring chart, GPU slot indicators, organized layout (GPU/Queue+Model+Agent/Usage/Live) Abiba (pi) 2026-05-16 21:05:19 +00:00
  • 654cdff718 Dashboard: GPU slot indicators show active/max concurrent requests. Koonimo API key added. Real-time queuing visibility. Abiba (pi) 2026-05-16 20:43:22 +00:00
  • bf90e57c5f Load-aware routing: tracks active GPU requests in Redis, distributes overflow when MoE saturated. 6 concurrent requests now spread across all 3 GPUs instead of queuing on one. Abiba (pi) 2026-05-16 20:23:32 +00:00
  • 2db2796e53 Dashboard: rename to SyslogAI Harness, GPU bar now shows utilization instead of VRAM Abiba (pi) 2026-05-16 19:26:46 +00:00
  • ec0f9fac63 Fix: clean_unicode now uses chr()-based replacements + ASCII strip to prevent bash heredoc corruption. Emoji and all non-ASCII now fully stripped. Abiba (pi) 2026-05-16 19:12:58 +00:00
  • 3d42ea4767 Merge: add Abiba harness code — nginx, LiteLLM, router, dashboard, Redis Abiba (pi) 2026-05-16 18:53:31 +00:00
  • 7b6c6aabe1 Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis Abiba (pi) 2026-05-16 18:51:50 +00:00
  • e95475f431 Add GPU dashboard container + Nginx routing SyslogBot 2026-05-15 22:25:56 +00:00
  • b65ea22765 Update Nginx Docker config mumuni-bot 2026-05-15 21:35:13 +00:00
  • cf7f61650f Add Dockerfile.dashboard mumuni-bot 2026-05-15 21:34:52 +00:00
  • 7d00bbec0e Add Dockerfile.queue mumuni-bot 2026-05-15 21:34:49 +00:00
  • 37f7c95b05 Add env example mumuni-bot 2026-05-15 21:07:34 +00:00
  • a28b3a557d Add Nginx router config mumuni-bot 2026-05-15 21:07:33 +00:00
  • c42f3a9979 Add migration plan mumuni-bot 2026-05-15 21:07:32 +00:00
  • e1f12c3462 Add dashboard mumuni-bot 2026-05-15 21:07:07 +00:00
  • b55b954967 Add queue service mumuni-bot 2026-05-15 21:07:05 +00:00
  • c85aaa570b Add docker-compose mumuni-bot 2026-05-15 21:07:05 +00:00
  • 43382dac5b Initial commit: README mumuni-bot 2026-05-15 21:07:03 +00:00