-
060a47fce9
revert: MoE back to 2 slots (cross-agent spread now prevents hotspot)
main
Abiba
2026-05-30 13:15:19 +00:00
-
34fb7516e1
fix: cross-agent GPU spreading prevents hotspot hammering
Abiba
2026-05-30 12:55:29 +00:00
-
acbcb20837
fix: MoE concurrency 2→1 (95C thermal emergency)
Abiba
2026-05-30 12:52:23 +00:00
-
a3bca93d9b
fix: buffer SSE chunks for large streaming responses
Abiba
2026-05-29 09:45:41 +00:00
-
d53685d874
feat: agent-aware GPU load balancing
Abiba
2026-05-28 21:45:23 +00:00
-
54a4f26db7
fix: Default tier back to Dense-first (MoE overheating at 91°C)
Abiba
2026-05-28 21:40:18 +00:00
-
fb1d51b93b
restructure: routing prioritized by reasoning requirements
Abiba
2026-05-27 07:22:30 +00:00
-
9a0d69ce8d
feat: Dense 128K context + 2 slots, VLM second in Heavy tier
Abiba
2026-05-27 07:15:58 +00:00
-
621a897bec
tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM
Abiba
2026-05-27 00:29:25 +00:00
-
93d0d3cc4b
revert: MoE concurrency back to 2 (Dense-first routing handles thermal)
Abiba
2026-05-27 00:04:42 +00:00
-
c4ea5e3a98
fix: flip Tier 4 (Heavy) to Dense-first for thermal safety
Abiba
2026-05-27 00:01:33 +00:00
-
ebe8f9ced4
fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)
Abiba
2026-05-26 23:47:08 +00:00
-
b3db0841ef
feat: redesigned routing tiers for even GPU distribution + speed priority
Abiba
2026-05-26 22:00:20 +00:00
-
80362fa528
fix: default performance window to 24h so all models appear immediately
Abiba
2026-05-26 12:37:52 +00:00
-
7ef9e58f61
fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries)
Abiba
2026-05-26 12:31:53 +00:00
-
f47c3f3304
feat: latency vs prompt size scatter plot on dashboard
Abiba
2026-05-26 12:18:31 +00:00
-
cfb05fa501
feat: capture streaming token counts from SSE final chunk
Abiba
2026-05-25 19:58:51 +00:00
-
b2ec4b0572
fix: throughput panel handles streaming-only models gracefully
Abiba
2026-05-25 19:45:21 +00:00
-
8c5c922a4e
fix: handle single data point in performance percentiles
Abiba
2026-05-25 17:00:40 +00:00
-
f42747d721
feat: performance analytics panel on dashboard
Abiba
2026-05-25 16:58:15 +00:00
-
b849cd3395
feat: per-request performance tracking + /metrics/performance endpoint
Abiba
2026-05-25 16:50:45 +00:00
-
b7882b2434
fix: reduce 27B Dense context to 192K to free VRAM
Abiba
2026-05-25 00:31:40 +00:00
-
ddde6646de
fix: decouple VRAM usage from saturation status
Abiba
2026-05-23 06:00:37 +00:00
-
41939104c7
fix: non-blocking GPU health checks + 256K turboquant context upgrade
Abiba
2026-05-23 05:57:13 +00:00
-
0983337fdb
fix: heavy tier Dense→MoE→VLM
Abiba
2026-05-19 21:24:36 +00:00
-
28d62e27ba
feat: context-aware routing + compaction signals
Abiba
2026-05-19 21:13:57 +00:00
-
714ebb003e
fix: heavy threshold → 50000 tokens, 25 turns
Abiba
2026-05-19 21:08:18 +00:00
-
e90bf0216d
fix: raise heavy threshold — 4000→12000 tokens, 8→15 turns
Abiba
2026-05-19 20:10:07 +00:00
-
5971ceee4e
security: reject requests without valid API key (401)
Abiba
2026-05-19 19:15:13 +00:00
-
5f05f46c7c
fix: heavy tier — Dense first for reasoning, MoE workhorse, VLM overflow
Abiba
2026-05-19 18:27:24 +00:00
-
911fdc9f3f
fix: routing priority — MoE first, VLM second, Dense last
Abiba
2026-05-19 17:38:29 +00:00
-
d9d2c213f6
fix: routing — remove turn limit from default tier, no gaps
Abiba
2026-05-19 17:24:41 +00:00
-
6625892908
feat: redesigned routing tiers — VLM handles more traffic
Abiba
2026-05-19 17:01:58 +00:00
-
fcb99a26c8
revert: remove Ollama endpoints
Abiba
2026-05-19 16:57:05 +00:00
-
2234d03079
fix: add /v1/props and /v1/models/<id> endpoints
Abiba
2026-05-19 16:08:58 +00:00
-
5b99b16712
feat: add request queuing to router (replaces hard 503)
Abiba
2026-05-19 15:55:13 +00:00
-
28fc57c5c7
May 19, 2026: Full harness update
Abiba
2026-05-19 15:03:47 +00:00