941e8db65e3a38785fb046fc7575742a1778d182
New 4-tier routing: - TIER 1 (Lightweight): ≤100 words, single-turn → VLM first, fallback Dense - TIER 2 (Simple Conv): ≤1000 tokens, ≤4 turns → VLM preferred, fallback Dense - TIER 3 (Heavy): >4000 tokens, system prompts, >8 turns → Dense→MoE→VLM cascade - TIER 4 (Default): Medium tasks → Dense preferred, MoE default, VLM overflow VLM gets more utilization for simple conversations instead of defaulting everything to MoE.
syslog-harness — Inference API Harness
CT 116 Docker stack for routing local GPU models through a unified OpenAI-compatible API.
Architecture
nginx :80 → router :9000 → GPU backends
├─ qwen3.6-35B-A3B (MoE) @ 192.168.68.15:8080
├─ qwen3.6-27B-code (Dense) @ 192.168.68.8:8080
└─ qwen3.5-9b-vlm (VLM) @ 192.168.68.110:8080
LiteLLM :8081 (fallback) | Dashboard :3000 | Redis :6379 (local)
Deploy
cd /opt/inference-harness
docker compose up -d
Endpoints
| URL | Purpose |
|---|---|
/v1/chat/completions |
Inference API (OpenAI-compatible) |
/v1/models |
Available models |
/ |
Dashboard (GPU health, routing, agents, timeseries) |
Agent API Keys
| Agent | Key |
|---|---|
| Abiba | sk-syslog-abiba |
| Mumuni | sk-syslog-mumuni |
| Tanko | sk-syslog-tanko |
| Koby | sk-syslog-koby |
| Kagenz0 | sk-syslog-kagenz0 |
| Koonimo | sk-syslog-koonimo |
Description
Syslog Operational Agent Harness — Nginx routing, Redis queue, circuit breaker, monitoring, Docker migration
Languages
Python
99.2%
Dockerfile
0.8%