inference-harness

abiba-bot/inference-harness

Fork 0

ac13ecaaf7 auto-fix: harness-dashboard restarted — container was down, now healthy main Abiba 2026-06-28 01:36:00 +00:00
0ca3b65ad4 chore: add .gitignore for .bak and .backup files Abiba 2026-06-25 20:33:50 +00:00
13eb8cb75b Merge branch 'main' of http://192.168.68.17:3000/SyslogSolution/syslog-harness Abiba 2026-06-25 20:33:46 +00:00
08680b0f9e fix: LiteLLM OIDC + Admin UI fixes - Authentik integration restored Abiba 2026-06-25 20:33:22 +00:00
621fb3540a fix: triple litellm_settings merged into one + add /ui/, /sso/, /litellm-asset-prefix/ to port 80 nginx kagentz-bot 2026-06-24 13:50:09 -04:00
3d6b8173b0 fix: add dedicated location /openapi.json block to return spec JSON instead of SPA HTML kagentz-bot 2026-06-24 12:37:11 -04:00
aa5ac4a280 fix: restore root / location to proxy to litellm_backend after nginx.conf recovery from sabotage kagentz-bot 2026-06-24 12:05:55 -04:00
ce703b8328 fix(nginx): Docker DNS resolver + variable proxy_pass for reliable container DNS kagentz-bot 2026-06-24 11:41:52 -04:00
fd3c2a575a feat: 2-layer architecture foundation + agent migration Abiba 2026-06-17 22:40:33 +00:00
776343f2ab feat(plan): add fallback chains and resolve model identity gap Abiba 2026-06-14 22:48:53 +00:00
492a4fe68b feat(plan): resolve all migration gaps and update router logic for LiteLLM integration jerome 2026-06-14 17:04:40 -04:00
84e0d163ee feat(plan): update LiteLLM migration plan for CT 116 deployment with Authentik OIDC + zero-downtime strategy kagentz-botandAbiba 2026-06-14 08:22:32 -04:00
d901235c03 docs: LiteLLM migration plan — two-layer architecture with model identity gap analysis Abiba 2026-06-14 00:39:59 +00:00
4c7ac3350d fix(dashboard): latest visual fixes (navbar, layout, status labels) jerome 2026-06-12 22:13:30 -04:00
316f2f5f45 fix(router): handle None temp_c/vram_pct in gpu_health_score Abiba 2026-06-12 18:11:23 +00:00
574076119c merge: accept our deployed Phase 1-3 + dashboard as authoritative Abiba 2026-06-12 17:58:22 +00:00
ad9881f141 feat(dashboard): live GPU health scoring + real KPIs Abiba 2026-06-12 17:57:46 +00:00
3625fdc860 feat(harness): sync production-ready fixes (context, circuit-breaker, ports) jerome 2026-06-12 13:10:38 -04:00
a860a8fd0f feat(router): Phase 3 - Dynamic GPU Weighting via Health Scoring Abiba 2026-06-11 01:06:13 +00:00
fabbe340d6 feat(router): Phase 2 - Atomic session token tracking via Redis Lua script Abiba 2026-06-11 00:57:29 +00:00
2e24ee5598 feat(router): Phase 1 - Circuit Breaker + /metrics/circuit-breaker endpoint Abiba 2026-06-11 00:47:15 +00:00
19f7d90cc1 feat(nginx): Phase 1 - add /metrics/circuit-breaker proxy route Abiba 2026-06-11 00:39:21 +00:00
b79af634d7 feat(router): Phase 1 - Actual Circuit Breaker for GPU hosts Abiba 2026-06-11 00:29:53 +00:00
a992d4b88f feat(nginx): Phase 1 - Add /metrics/circuit-breaker proxy route jerome 2026-06-10 19:27:50 -04:00
c3dfe62cec feat(router): Phase 1 - Circuit Breaker for GPU hosts jerome 2026-06-10 19:16:28 -04:00
f1d095e411 Phase 0.5 deployed: admin endpoints + nginx /admin/ routing + ADMIN_KEY Abiba 2026-06-08 11:31:07 +00:00
ae3f581e3e Phase 0.5: Re-add ADMIN_KEY, /admin/keys endpoints, dual-key logging to current HEAD jerome 2026-06-08 05:18:20 -04:00
c7b22f4d09 Fix counter_audit_loop: use get_redis() instead of stale r/rd refs Abiba 2026-06-07 23:33:43 +00:00
815ed7991f Merge SyslogSolution/syslog-harness: accept current state (Phase 0 + Redis lazy reconnect + dashboard fix) Abiba 2026-06-07 23:14:28 +00:00
633afc5e29 Router: lazy Redis reconnect (survives Redis restarts/reboots) Abiba 2026-06-07 23:01:20 +00:00
85608d7c60 Dashboard + LiteLLM config updates from maintenance Abiba 2026-06-07 22:49:50 +00:00
24f0928ea1 Phase 0: model migration (qwen3.5-9b-vlm → gemma-4-12b), context alignment (all 262K), routing tiers + MoE spillover, fix dashboard window=1h parsing Abiba 2026-06-07 22:49:46 +00:00
867a6189df Dashboard: fix undefined labels for retired models + safe fallback helpers Abiba Bot 2026-06-06 01:46:53 +00:00
39cf297a8f Phase 0: Dual-key router — no hardcoded keys, gemma sync, 262K context fix Abiba Bot 2026-06-06 00:51:29 +00:00
1e24bc3b9b docs: update model table with context windows, capabilities, GPU labels Abiba Bot 2026-06-05 23:49:03 +00:00
e6a6c30211 router: v3 — 5-tier routing with vision guard, MoE spillover, code hint Abiba Bot 2026-06-05 23:48:45 +00:00
c0be4c1699 router: reference - new 5-tier routing with vision guard (deployed CT116) Abiba Bot 2026-06-05 23:32:27 +00:00
424d943b12 router: fix Dense context window 98K→262K (matches actual n_ctx) Abiba Bot 2026-06-05 23:15:18 +00:00
b6abd84062 dashboard: rename VLM GPU label 'New Backend' → 'RTX 5070' for consistency Abiba Bot 2026-06-05 21:53:07 +00:00
dace488f93 VLM migration: qwen3.5-9b-vlm → gemma-4-12b across entire harness Abiba Bot 2026-06-05 21:37:18 +00:00
370f5546dd dashboard: migrate VLM model qwen3.5-9b-vlm → gemma-4-12b (new backend 192.168.68.110:8080/v1) Abiba Bot 2026-06-05 21:26:51 +00:00
0cb4597b0e security: move API keys to env var, strip from source code fallback AbibaandAbiba via Kwame 2026-06-03 12:40:04 +00:00
9a633583ab fix: Security hardening from CT116 deep-dive review Abiba 2026-06-02 10:37:10 +00:00
060a47fce9 revert: MoE back to 2 slots (cross-agent spread now prevents hotspot) Abiba 2026-05-30 13:15:19 +00:00
34fb7516e1 fix: cross-agent GPU spreading prevents hotspot hammering Abiba 2026-05-30 12:55:29 +00:00
acbcb20837 fix: MoE concurrency 2→1 (95C thermal emergency) Abiba 2026-05-30 12:52:23 +00:00
a3bca93d9b fix: buffer SSE chunks for large streaming responses Abiba 2026-05-29 09:45:41 +00:00
d53685d874 feat: agent-aware GPU load balancing Abiba 2026-05-28 21:45:23 +00:00
54a4f26db7 fix: Default tier back to Dense-first (MoE overheating at 91°C) Abiba 2026-05-28 21:40:18 +00:00
fb1d51b93b restructure: routing prioritized by reasoning requirements Abiba 2026-05-27 07:22:30 +00:00
9a0d69ce8d feat: Dense 128K context + 2 slots, VLM second in Heavy tier Abiba 2026-05-27 07:15:58 +00:00
621a897bec tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM Abiba 2026-05-27 00:29:25 +00:00
93d0d3cc4b revert: MoE concurrency back to 2 (Dense-first routing handles thermal) Abiba 2026-05-27 00:04:42 +00:00
c4ea5e3a98 fix: flip Tier 4 (Heavy) to Dense-first for thermal safety Abiba 2026-05-27 00:01:33 +00:00
ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C) Abiba 2026-05-26 23:47:08 +00:00
b3db0841ef feat: redesigned routing tiers for even GPU distribution + speed priority Abiba 2026-05-26 22:00:20 +00:00
80362fa528 fix: default performance window to 24h so all models appear immediately Abiba 2026-05-26 12:37:52 +00:00
7ef9e58f61 fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries) Abiba 2026-05-26 12:31:53 +00:00
f47c3f3304 feat: latency vs prompt size scatter plot on dashboard Abiba 2026-05-26 12:18:31 +00:00
cfb05fa501 feat: capture streaming token counts from SSE final chunk Abiba 2026-05-25 19:58:51 +00:00
b2ec4b0572 fix: throughput panel handles streaming-only models gracefully Abiba 2026-05-25 19:45:21 +00:00
8c5c922a4e fix: handle single data point in performance percentiles Abiba 2026-05-25 17:00:40 +00:00
f42747d721 feat: performance analytics panel on dashboard Abiba 2026-05-25 16:58:15 +00:00
b849cd3395 feat: per-request performance tracking + /metrics/performance endpoint Abiba 2026-05-25 16:50:45 +00:00
b7882b2434 fix: reduce 27B Dense context to 192K to free VRAM Abiba 2026-05-25 00:31:40 +00:00
ddde6646de fix: decouple VRAM usage from saturation status Abiba 2026-05-23 06:00:37 +00:00
41939104c7 fix: non-blocking GPU health checks + 256K turboquant context upgrade Abiba 2026-05-23 05:57:13 +00:00
5116e4b1a7 router: heavy tier Dense→MoE→Light + X-Context-Warning headers (compact_soon/compact_recommended/compact_urgent) root 2026-05-22 09:48:00 +00:00
e55bcef21a router: 4 optimizations — saturated flag fix, heavy tier MoE-first, better token est, session tracking Abiba 2026-05-21 20:47:48 +00:00
0983337fdb fix: heavy tier Dense→MoE→VLM Abiba 2026-05-19 21:24:36 +00:00
32bd817e97 fix: heavy tier back to Dense→MoE→VLM (Dense now 98K) Abiba 2026-05-19 21:24:36 +00:00
79965450bb fix: Dense context 65K→98K, parallel restored to 2 Abiba 2026-05-19 21:20:29 +00:00
6c829abef5 fix: variable collision (r = Redis vs Response) in stream handler Abiba 2026-05-19 21:15:23 +00:00
28d62e27ba feat: context-aware routing + compaction signals Abiba 2026-05-19 21:13:57 +00:00
6efd5ff51c feat: context-aware routing + compaction signals Abiba 2026-05-19 21:13:56 +00:00
350a90b524 fix: sync tier 4 default threshold to 50000 tokens (was stale at 4000) Abiba 2026-05-19 21:11:34 +00:00
714ebb003e fix: heavy threshold → 50000 tokens, 25 turns Abiba 2026-05-19 21:08:18 +00:00
3156c093d5 fix: heavy threshold → 50000 tokens, 25 turns (agent contexts are huge) Abiba 2026-05-19 21:08:18 +00:00
e90bf0216d fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns Abiba 2026-05-19 20:10:07 +00:00
3cbf38e3e2 fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns Abiba 2026-05-19 20:09:59 +00:00
b67021ac69 docs: complete design documentation — auth, routing tiers, queue, models, maintenance Abiba 2026-05-19 19:17:52 +00:00
5971ceee4e security: reject requests without valid API key (401) Abiba 2026-05-19 19:15:13 +00:00
46dda918de security: reject requests without valid API key (401 instead of defaulting to starter) Abiba 2026-05-19 19:13:52 +00:00
5f05f46c7c fix: heavy tier — Dense first for reasoning, MoE workhorse, VLM overflow Abiba 2026-05-19 18:27:24 +00:00
7a78c0f98d fix: heavy tier — Dense first (best for reasoning), then MoE, then VLM Abiba 2026-05-19 18:20:20 +00:00
15c474aea0 fix: select_best_gpu respects candidate order — first non-busy wins Abiba 2026-05-19 18:18:00 +00:00
911fdc9f3f fix: routing priority — MoE first, VLM second, Dense last Abiba 2026-05-19 17:38:29 +00:00
bfc38f5436 fix: routing priority — MoE first, VLM second, Dense last (slow) Abiba 2026-05-19 17:38:21 +00:00
d9d2c213f6 fix: routing — remove turn limit from default tier, no gaps Abiba 2026-05-19 17:24:41 +00:00
f519a3fa60 fix: routing — system prompts no longer force heavy tier Abiba 2026-05-19 17:19:29 +00:00
6625892908 feat: redesigned routing tiers — VLM handles more traffic Abiba 2026-05-19 17:01:58 +00:00
941e8db65e feat: redesigned routing tiers — VLM handles more traffic Abiba 2026-05-19 17:01:55 +00:00
fcb99a26c8 revert: remove Ollama endpoints Abiba 2026-05-19 16:57:05 +00:00
241de4f38c revert: remove Ollama endpoints (llama.cpp uses OpenAI format, not Ollama) Abiba 2026-05-19 16:57:04 +00:00
2234d03079 fix: add /v1/props and /v1/models/<id> endpoints Abiba 2026-05-19 16:08:58 +00:00
beb2d1790a fix: add /v1/props and /v1/models/<id> Ollama-compatible endpoints Abiba 2026-05-19 16:08:24 +00:00
5b99b16712 feat: add request queuing to router (replaces hard 503) Abiba 2026-05-19 15:55:13 +00:00
f2f8e8c921 feat: add request queuing to router (replaces hard 503 on saturation) Abiba 2026-05-19 15:55:05 +00:00
76ade81fda docs: add Koonimo to agent API keys table Abiba 2026-05-19 15:48:39 +00:00
28fc57c5c7 May 19, 2026: Full harness update Abiba 2026-05-19 15:03:47 +00:00

1 2

Commit Graph Select branches Hide Pull Requests main #1 Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

#1