syslog-harness

Author	SHA1	Message	Date
Abiba	bfc38f5436	fix: routing priority — MoE first, VLM second, Dense last (slow) All tiers now follow MoE → VLM → Dense priority order since Dense (RTX 3090) can be slow. VLM acts as overflow absorber.	2026-05-19 17:38:21 +00:00
Abiba	f519a3fa60	fix: routing — system prompts no longer force heavy tier System messages are common in agent conversations but don't indicate heavy workload. Now only token count (>4000) and turn count (>8) trigger heavy routing. Simple conversations with system prompts can now route to VLM.	2026-05-19 17:19:29 +00:00
Abiba	941e8db65e	feat: redesigned routing tiers — VLM handles more traffic New 4-tier routing: - TIER 1 (Lightweight): ≤100 words, single-turn → VLM first, fallback Dense - TIER 2 (Simple Conv): ≤1000 tokens, ≤4 turns → VLM preferred, fallback Dense - TIER 3 (Heavy): >4000 tokens, system prompts, >8 turns → Dense→MoE→VLM cascade - TIER 4 (Default): Medium tasks → Dense preferred, MoE default, VLM overflow VLM gets more utilization for simple conversations instead of defaulting everything to MoE.	2026-05-19 17:01:55 +00:00
Abiba	241de4f38c	revert: remove Ollama endpoints (llama.cpp uses OpenAI format, not Ollama)	2026-05-19 16:57:04 +00:00
Abiba	beb2d1790a	fix: add /v1/props and /v1/models/<id> Ollama-compatible endpoints Mumuni's Ollama client probes /v1/props for model discovery and /v1/models/<id> for per-model details. Previously both returned 404, causing client retries. Now returns proper model properties and details.	2026-05-19 16:08:24 +00:00
Abiba	f2f8e8c921	feat: add request queuing to router (replaces hard 503 on saturation) When all GPUs are saturated, requests now enter a queue loop (poll every 500ms) instead of immediately returning 503. Configurable via QUEUE_TIMEOUT env var (default 30s) or X-Queue-Timeout header per-request. This prevents agent failures from cluster saturation — agents wait for a slot instead of crashing on fallback.	2026-05-19 15:55:05 +00:00
Abiba	76ade81fda	docs: add Koonimo to agent API keys table	2026-05-19 15:48:39 +00:00
Abiba	9c31b5d622	May 19, 2026: Full harness update - Model migration: gemma-4-E4B → qwen3.5-9b-vlm - Dashboard reorder: Usage Over Time + GPU Metrics to top - Router counter leak fix (gpu_decr in except handler) - VLM slot upgrade 1→2 - Redis stale key cleanup - Automated maintenance cron job - LiteLLM config update - GPU router config update - README update	2026-05-19 15:03:34 +00:00
Abiba (pi)	4f032b035c	Mumuni review action items: health checks for all containers, version pinning, 503+Retry-After on all-GPU saturation	2026-05-17 09:05:27 +00:00
Abiba (pi)	8f3b0c6647	Router: health check verifies actual llama.cpp endpoint, gpu_decr negative guard, AMD sidecar fixed (sysfs fallback)	2026-05-17 01:52:28 +00:00
Abiba (pi)	808c9d3d13	Router: 300s timeout, gpu_decr bugfix. Dashboard: Bootstrap 5 modern redesign with KPI stats, equal-height cards, queue ring. Nginx: 600s timeout.	2026-05-16 22:12:21 +00:00
Abiba (pi)	9817fe2ef2	Dashboard: clean rebuild with Queue Status ring chart, GPU slot indicators, organized layout (GPU/Queue+Model+Agent/Usage/Live)	2026-05-16 21:05:19 +00:00
Abiba (pi)	654cdff718	Dashboard: GPU slot indicators show active/max concurrent requests. Koonimo API key added. Real-time queuing visibility.	2026-05-16 20:43:22 +00:00
Abiba (pi)	bf90e57c5f	Load-aware routing: tracks active GPU requests in Redis, distributes overflow when MoE saturated. 6 concurrent requests now spread across all 3 GPUs instead of queuing on one.	2026-05-16 20:23:32 +00:00
Abiba (pi)	2db2796e53	Dashboard: rename to SyslogAI Harness, GPU bar now shows utilization instead of VRAM	2026-05-16 19:26:46 +00:00
Abiba (pi)	ec0f9fac63	Fix: clean_unicode now uses chr()-based replacements + ASCII strip to prevent bash heredoc corruption. Emoji and all non-ASCII now fully stripped.	2026-05-16 19:12:58 +00:00
Abiba (pi)	3d42ea4767	Merge: add Abiba harness code — nginx, LiteLLM, router, dashboard, Redis	2026-05-16 18:53:31 +00:00
Abiba (pi)	7b6c6aabe1	Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis - Complexity-based routing (MoE default, Dense heavy, Gemma light) - Per-agent API keys with metrics tracking - Time-series usage graphs (24h/7d/30d) - Streaming support (SSE passthrough) - Unicode cleanup (ASCII-only output) - Vision support (gemma-4-E4B) - Tier enforcement (starter/professional/enterprise) - GPU health monitoring via sidecar polling - Unified dashboard with line graph	2026-05-16 18:51:50 +00:00
mumuni-bot	b65ea22765	Update Nginx Docker config	2026-05-15 21:35:13 +00:00
mumuni-bot	cf7f61650f	Add Dockerfile.dashboard	2026-05-15 21:34:52 +00:00
mumuni-bot	7d00bbec0e	Add Dockerfile.queue	2026-05-15 21:34:49 +00:00
mumuni-bot	37f7c95b05	Add env example	2026-05-15 21:07:34 +00:00
mumuni-bot	a28b3a557d	Add Nginx router config	2026-05-15 21:07:33 +00:00
mumuni-bot	c42f3a9979	Add migration plan	2026-05-15 21:07:32 +00:00
mumuni-bot	e1f12c3462	Add dashboard	2026-05-15 21:07:07 +00:00
mumuni-bot	b55b954967	Add queue service	2026-05-15 21:07:05 +00:00
mumuni-bot	c85aaa570b	Add docker-compose	2026-05-15 21:07:05 +00:00
mumuni-bot	43382dac5b	Initial commit: README	2026-05-15 21:07:03 +00:00