64 lines
1.8 KiB
Markdown
64 lines
1.8 KiB
Markdown
# Syslog Harness
|
|
|
|
Operational orchestration layer for Syslog's internal AI agents.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
|
│ Agent │────>│ Nginx │────>│ GPU Pool │
|
|
│ (Hermes) │ │ Router │ │ (MoE/Dense)│
|
|
└─────────────┘ └──────────────┘ └─────────────┘
|
|
│
|
|
├──> :8091 Queue Service (Docker)
|
|
│
|
|
└──> :3001 Dashboard (Docker)
|
|
```
|
|
|
|
## Components
|
|
|
|
| Service | Port | Container | Purpose |
|
|
|---|---|---|---|
|
|
| Nginx Router | 8080 | Host | Routes requests to GPU backends |
|
|
| Queue Service | 8091 | `syslog-queue` | Enqueues requests when GPUs are down |
|
|
| Dashboard | 3001 | `syslog-dashboard` | Observability UI + API |
|
|
|
|
## GPU Routing
|
|
|
|
| Header `X-Syslog-Model` | Backend | Model |
|
|
|---|---|---|
|
|
| (none) / `standard` | amdpve (.15) | qwen3.6-35B-A3B (MoE) |
|
|
| `heavy` / `qwen3.5-27B` | llmgpu (.8) | qwen3.5-27B (Dense) |
|
|
| `light` / `gemma-4` | ocu_llm (.110) | gemma-4-E4B (Light) |
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Build & start
|
|
docker compose build
|
|
docker compose up -d
|
|
|
|
# Verify
|
|
curl http://localhost:8091/health
|
|
curl http://localhost:3001/api/status
|
|
```
|
|
|
|
## Dashboard
|
|
|
|
- **UI:** `http://<host>:8080/dashboard/harness.html`
|
|
- **API:** `http://<host>:8080/dashboard/api/status`
|
|
|
|
## Circuit Breaker
|
|
|
|
- Rate limit: 10 req/s per IP
|
|
- Burst: 20 requests
|
|
- Excess returns 503
|
|
- Queue fallback on GPU 502/503
|
|
|
|
## Production Migration
|
|
|
|
See [MIGRATION_PLAN.md](./MIGRATION_PLAN.md)
|
|
|
|
---
|
|
*Built for Syslog Solution LLC — Quality over speed.*
|