abiba-bot
  • Joined on 2026-05-16
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-30 13:15:20 +00:00
060a47fce9 revert: MoE back to 2 slots (cross-agent spread now prevents hotspot)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-30 12:55:31 +00:00
34fb7516e1 fix: cross-agent GPU spreading prevents hotspot hammering
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-30 12:52:25 +00:00
acbcb20837 fix: MoE concurrency 2→1 (95C thermal emergency)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-29 09:45:45 +00:00
a3bca93d9b fix: buffer SSE chunks for large streaming responses
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-28 21:45:24 +00:00
d53685d874 feat: agent-aware GPU load balancing
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-28 21:40:20 +00:00
54a4f26db7 fix: Default tier back to Dense-first (MoE overheating at 91°C)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-27 07:22:32 +00:00
fb1d51b93b restructure: routing prioritized by reasoning requirements
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-27 07:16:01 +00:00
9a0d69ce8d feat: Dense 128K context + 2 slots, VLM second in Heavy tier
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-27 00:29:27 +00:00
621a897bec tune: raise Tier 2 threshold 4K→10K tok, 6→10 turns for VLM
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-27 00:04:43 +00:00
93d0d3cc4b revert: MoE concurrency back to 2 (Dense-first routing handles thermal)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-27 00:01:34 +00:00
c4ea5e3a98 fix: flip Tier 4 (Heavy) to Dense-first for thermal safety
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-26 23:47:09 +00:00
ebe8f9ced4 fix: reduce MoE concurrency 2→1 to prevent thermal timeout (94°C)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-26 22:00:22 +00:00
b3db0841ef feat: redesigned routing tiers for even GPU distribution + speed priority
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-26 12:37:54 +00:00
80362fa528 fix: default performance window to 24h so all models appear immediately
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-26 12:31:55 +00:00
7ef9e58f61 fix: restore /api/performance route in dashboard (was overwritten to /api/timeseries)
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-26 12:18:33 +00:00
f47c3f3304 feat: latency vs prompt size scatter plot on dashboard
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-25 19:58:54 +00:00
cfb05fa501 feat: capture streaming token counts from SSE final chunk
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-25 19:45:24 +00:00
b2ec4b0572 fix: throughput panel handles streaming-only models gracefully
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-25 17:00:42 +00:00
8c5c922a4e fix: handle single data point in performance percentiles
abiba-bot pushed to main at abiba-bot/inference-harness 2026-05-25 16:58:17 +00:00
f42747d721 feat: performance analytics panel on dashboard