Abiba (pi) 7b6c6aabe1 Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis
- Complexity-based routing (MoE default, Dense heavy, Gemma light)
- Per-agent API keys with metrics tracking
- Time-series usage graphs (24h/7d/30d)
- Streaming support (SSE passthrough)
- Unicode cleanup (ASCII-only output)
- Vision support (gemma-4-E4B)
- Tier enforcement (starter/professional/enterprise)
- GPU health monitoring via sidecar polling
- Unified dashboard with line graph
2026-05-16 18:51:50 +00:00

syslog-harness — Inference API Harness

CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.

Architecture

nginx :80 → router :9000 → GPU backends
                ├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
                ├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
                └─ gemma-4-E4B (Light) @ 192.168.68.110:8080

LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)

Deploy

cd /opt/inference-harness
docker compose up -d

Endpoints

URL Purpose
/v1/chat/completions Inference API (OpenAI-compatible)
/v1/models Available models
/ Dashboard (GPU health, routing, agents, timeseries)

Agent API Keys

Agent Key
Abiba sk-syslog-abiba
Mumuni sk-syslog-mumuni
Tanko sk-syslog-tanko
Koby sk-syslog-koby
Kagenz0 sk-syslog-kagenz0
S
Description
Syslog Operational Agent Harness — Nginx routing, Redis queue, circuit breaker, monitoring, Docker migration
Readme 311 KiB
Languages
Python 99.2%
Dockerfile 0.8%