# syslog-harness — Inference API Harness CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API. ## Architecture ``` nginx :80 → router :9000 → GPU backends ├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080 ├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080 └─ gemma-4-E4B (Light) @ 192.168.68.110:8080 LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local) ``` ## Deploy ```bash cd /opt/inference-harness docker compose up -d ``` ## Endpoints | URL | Purpose | |-----|---------| | `/v1/chat/completions` | Inference API (OpenAI-compatible) | | `/v1/models` | Available models | | `/` | Dashboard (GPU health, routing, agents, timeseries) | ## Agent API Keys | Agent | Key | |-------|-----| | Abiba | `sk-syslog-abiba` | | Mumuni | `sk-syslog-mumuni` | | Tanko | `sk-syslog-tanko` | | Koby | `sk-syslog-koby` | | Kagenz0 | `sk-syslog-kagenz0` |