Files
syslog-harness/README.md
T
2026-05-19 15:48:39 +00:00

41 lines
993 B
Markdown

# syslog-harness — Inference API Harness
CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.
## Architecture
```
nginx :80 → router :9000 → GPU backends
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
```
## Deploy
```bash
cd /opt/inference-harness
docker compose up -d
```
## Endpoints
| URL | Purpose |
|-----|---------|
| `/v1/chat/completions` | Inference API (OpenAI-compatible) |
| `/v1/models` | Available models |
| `/` | Dashboard (GPU health, routing, agents, timeseries) |
## Agent API Keys
| Agent | Key |
|-------|-----|
| Abiba | `sk-syslog-abiba` |
| Mumuni | `sk-syslog-mumuni` |
| Tanko | `sk-syslog-tanko` |
| Koby | `sk-syslog-koby` |
| Kagenz0 | `sk-syslog-kagenz0` |
| Koonimo | `sk-syslog-koonimo` |