Initial commit: CT 116 inference harness — nginx, LiteLLM, router, dashboard, Redis
- Complexity-based routing (MoE default, Dense heavy, Gemma light) - Per-agent API keys with metrics tracking - Time-series usage graphs (24h/7d/30d) - Streaming support (SSE passthrough) - Unicode cleanup (ASCII-only output) - Vision support (gemma-4-E4B) - Tier enforcement (starter/professional/enterprise) - GPU health monitoring via sidecar polling - Unified dashboard with line graph
This commit is contained in:
@@ -0,0 +1,39 @@
|
||||
# syslog-harness — Inference API Harness
|
||||
|
||||
CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
nginx :80 → router :9000 → GPU backends
|
||||
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
|
||||
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
|
||||
└─ gemma-4-E4B (Light) @ 192.168.68.110:8080
|
||||
|
||||
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
|
||||
```
|
||||
|
||||
## Deploy
|
||||
|
||||
```bash
|
||||
cd /opt/inference-harness
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
| URL | Purpose |
|
||||
|-----|---------|
|
||||
| `/v1/chat/completions` | Inference API (OpenAI-compatible) |
|
||||
| `/v1/models` | Available models |
|
||||
| `/` | Dashboard (GPU health, routing, agents, timeseries) |
|
||||
|
||||
## Agent API Keys
|
||||
|
||||
| Agent | Key |
|
||||
|-------|-----|
|
||||
| Abiba | `sk-syslog-abiba` |
|
||||
| Mumuni | `sk-syslog-mumuni` |
|
||||
| Tanko | `sk-syslog-tanko` |
|
||||
| Koby | `sk-syslog-koby` |
|
||||
| Kagenz0 | `sk-syslog-kagenz0` |
|
||||
Reference in New Issue
Block a user