Files
syslog-harness/README.md
T
Abiba 9c31b5d622 May 19, 2026: Full harness update
- Model migration: gemma-4-E4B → qwen3.5-9b-vlm
- Dashboard reorder: Usage Over Time + GPU Metrics to top
- Router counter leak fix (gpu_decr in except handler)
- VLM slot upgrade 1→2
- Redis stale key cleanup
- Automated maintenance cron job
- LiteLLM config update
- GPU router config update
- README update
2026-05-19 15:03:34 +00:00

959 B

syslog-harness — Inference API Harness

CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.

Architecture

nginx :80 → router :9000 → GPU backends
                ├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
                ├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
                └─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080

LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)

Deploy

cd /opt/inference-harness
docker compose up -d

Endpoints

URL Purpose
/v1/chat/completions Inference API (OpenAI-compatible)
/v1/models Available models
/ Dashboard (GPU health, routing, agents, timeseries)

Agent API Keys

Agent Key
Abiba sk-syslog-abiba
Mumuni sk-syslog-mumuni
Tanko sk-syslog-tanko
Koby sk-syslog-koby
Kagenz0 sk-syslog-kagenz0