9c31b5d622
- Model migration: gemma-4-E4B → qwen3.5-9b-vlm - Dashboard reorder: Usage Over Time + GPU Metrics to top - Router counter leak fix (gpu_decr in except handler) - VLM slot upgrade 1→2 - Redis stale key cleanup - Automated maintenance cron job - LiteLLM config update - GPU router config update - README update
40 lines
959 B
Markdown
40 lines
959 B
Markdown
# syslog-harness — Inference API Harness
|
|
|
|
CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
nginx :80 → router :9000 → GPU backends
|
|
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
|
|
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
|
|
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
|
|
|
|
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
|
|
```
|
|
|
|
## Deploy
|
|
|
|
```bash
|
|
cd /opt/inference-harness
|
|
docker compose up -d
|
|
```
|
|
|
|
## Endpoints
|
|
|
|
| URL | Purpose |
|
|
|-----|---------|
|
|
| `/v1/chat/completions` | Inference API (OpenAI-compatible) |
|
|
| `/v1/models` | Available models |
|
|
| `/` | Dashboard (GPU health, routing, agents, timeseries) |
|
|
|
|
## Agent API Keys
|
|
|
|
| Agent | Key |
|
|
|-------|-----|
|
|
| Abiba | `sk-syslog-abiba` |
|
|
| Mumuni | `sk-syslog-mumuni` |
|
|
| Tanko | `sk-syslog-tanko` |
|
|
| Koby | `sk-syslog-koby` |
|
|
| Kagenz0 | `sk-syslog-kagenz0` |
|