Files
syslog-harness/README.md
T
Abiba 9c31b5d622 May 19, 2026: Full harness update
- Model migration: gemma-4-E4B → qwen3.5-9b-vlm
- Dashboard reorder: Usage Over Time + GPU Metrics to top
- Router counter leak fix (gpu_decr in except handler)
- VLM slot upgrade 1→2
- Redis stale key cleanup
- Automated maintenance cron job
- LiteLLM config update
- GPU router config update
- README update
2026-05-19 15:03:34 +00:00

40 lines
959 B
Markdown

# syslog-harness — Inference API Harness
CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.
## Architecture
```
nginx :80 → router :9000 → GPU backends
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
```
## Deploy
```bash
cd /opt/inference-harness
docker compose up -d
```
## Endpoints
| URL | Purpose |
|-----|---------|
| `/v1/chat/completions` | Inference API (OpenAI-compatible) |
| `/v1/models` | Available models |
| `/` | Dashboard (GPU health, routing, agents, timeseries) |
## Agent API Keys
| Agent | Key |
|-------|-----|
| Abiba | `sk-syslog-abiba` |
| Mumuni | `sk-syslog-mumuni` |
| Tanko | `sk-syslog-tanko` |
| Koby | `sk-syslog-koby` |
| Kagenz0 | `sk-syslog-kagenz0` |