Commit Graph

  • 060a47fce9 revert: MoE back to 2 slots (cross-agent spread now prevents hotspot) main Abiba 2026-05-30 13:15:19 +00:00
  • 34fb7516e1 fix: cross-agent GPU spreading prevents hotspot hammering Abiba 2026-05-30 12:55:29 +00:00
  • acbcb20837 fix: MoE concurrency 2→1 (95C thermal emergency) Abiba 2026-05-30 12:52:23 +00:00
  • a3bca93d9b fix: buffer SSE chunks for large streaming responses Abiba 2026-05-29 09:45:41 +00:00
  • d53685d874 feat: agent-aware GPU load balancing Abiba 2026-05-28 21:45:23 +00:00
  • 54a4f26db7 fix: Default tier back to Dense-first (MoE overheating at 91°C) Abiba 2026-05-28 21:40:18 +00:00
  • fb1d51b93b restructure: routing prioritized by reasoning requirements Abiba 2026-05-27 07:22:30 +00:00
  • 9a0d69ce8d feat: Dense 128K context + 2 slots, VLM second in Heavy tier Abiba 2026-05-27 07:15:58 +00:00
  • 621a897bec tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM Abiba 2026-05-27 00:29:25 +00:00
  • 93d0d3cc4b revert: MoE concurrency back to 2 (Dense-first routing handles thermal) Abiba 2026-05-27 00:04:42 +00:00
  • c4ea5e3a98 fix: flip Tier 4 (Heavy) to Dense-first for thermal safety Abiba 2026-05-27 00:01:33 +00:00
  • ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C) Abiba 2026-05-26 23:47:08 +00:00
  • b3db0841ef feat: redesigned routing tiers for even GPU distribution + speed priority Abiba 2026-05-26 22:00:20 +00:00
  • 80362fa528 fix: default performance window to 24h so all models appear immediately Abiba 2026-05-26 12:37:52 +00:00
  • 7ef9e58f61 fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries) Abiba 2026-05-26 12:31:53 +00:00
  • f47c3f3304 feat: latency vs prompt size scatter plot on dashboard Abiba 2026-05-26 12:18:31 +00:00
  • cfb05fa501 feat: capture streaming token counts from SSE final chunk Abiba 2026-05-25 19:58:51 +00:00
  • b2ec4b0572 fix: throughput panel handles streaming-only models gracefully Abiba 2026-05-25 19:45:21 +00:00
  • 8c5c922a4e fix: handle single data point in performance percentiles Abiba 2026-05-25 17:00:40 +00:00
  • f42747d721 feat: performance analytics panel on dashboard Abiba 2026-05-25 16:58:15 +00:00
  • b849cd3395 feat: per-request performance tracking + /metrics/performance endpoint Abiba 2026-05-25 16:50:45 +00:00
  • b7882b2434 fix: reduce 27B Dense context to 192K to free VRAM Abiba 2026-05-25 00:31:40 +00:00
  • ddde6646de fix: decouple VRAM usage from saturation status Abiba 2026-05-23 06:00:37 +00:00
  • 41939104c7 fix: non-blocking GPU health checks + 256K turboquant context upgrade Abiba 2026-05-23 05:57:13 +00:00
  • 0983337fdb fix: heavy tier Dense→MoE→VLM Abiba 2026-05-19 21:24:36 +00:00
  • 28d62e27ba feat: context-aware routing + compaction signals Abiba 2026-05-19 21:13:57 +00:00
  • 714ebb003e fix: heavy threshold → 50000 tokens, 25 turns Abiba 2026-05-19 21:08:18 +00:00
  • e90bf0216d fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns Abiba 2026-05-19 20:10:07 +00:00
  • 5971ceee4e security: reject requests without valid API key (401) Abiba 2026-05-19 19:15:13 +00:00
  • 5f05f46c7c fix: heavy tier — Dense first for reasoning, MoE workhorse, VLM overflow Abiba 2026-05-19 18:27:24 +00:00
  • 911fdc9f3f fix: routing priority — MoE first, VLM second, Dense last Abiba 2026-05-19 17:38:29 +00:00
  • d9d2c213f6 fix: routing — remove turn limit from default tier, no gaps Abiba 2026-05-19 17:24:41 +00:00
  • 6625892908 feat: redesigned routing tiers — VLM handles more traffic Abiba 2026-05-19 17:01:58 +00:00
  • fcb99a26c8 revert: remove Ollama endpoints Abiba 2026-05-19 16:57:05 +00:00
  • 2234d03079 fix: add /v1/props and /v1/models/<id> endpoints Abiba 2026-05-19 16:08:58 +00:00
  • 5b99b16712 feat: add request queuing to router (replaces hard 503) Abiba 2026-05-19 15:55:13 +00:00
  • 28fc57c5c7 May 19, 2026: Full harness update Abiba 2026-05-19 15:03:47 +00:00