feat: Smart Queue Consumer implementation draft + architecture review
- SMART_QUEUE_IMPLEMENTATION.md: Complete implementation draft (1572 lines) with 10 quick-win fixes and full smart queue consumer rewrite - ARCHITECTURE_REVIEW.md: 26-issue audit with prioritized findings - Verified all 3 GPUs live: amdpve (73% util), llmgpu (idle), ocu_llm (idle) - Redis 7.4.9 confirmed streams support - GPU sidecar metrics verified on all hosts Key fixes: - QW-1: Dockerfile path mismatch (Dockerfile.queue -> queue-service/Dockerfile) - QW-2: Nginx fallback only on ALL-GPU failure (not single GPU) - QW-3: Container names fixed to Docker service names - QW-4: Redis host default fixed (192.168.68.7 -> redis) - QW-5: Dependency version pinning - QW-7-10: Health checks, restart policy, Gunicorn, single-process collector Smart queue features: - Redis Streams + consumer groups - GPU-aware load balancing via sidecar metrics - Per-GPU circuit breakers with half-open recovery - Adaptive backpressure (0-30 normal, 30-40 warn, 40-50 503, >50 open) - Dead letter queue with retry endpoint - Job ID tracking and /status/<job_id> API
This commit is contained in:
@@ -0,0 +1,63 @@
|
||||
# Syslog Harness
|
||||
|
||||
Operational orchestration layer for Syslog's internal AI agents.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ Agent │────>│ Nginx │────>│ GPU Pool │
|
||||
│ (Hermes) │ │ Router │ │ (MoE/Dense)│
|
||||
└─────────────┘ └──────────────┘ └─────────────┘
|
||||
│
|
||||
├──> :8091 Queue Service (Docker)
|
||||
│
|
||||
└──> :3001 Dashboard (Docker)
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
| Service | Port | Container | Purpose |
|
||||
|---|---|---|---|
|
||||
| Nginx Router | 8080 | Host | Routes requests to GPU backends |
|
||||
| Queue Service | 8091 | `syslog-queue` | Enqueues requests when GPUs are down |
|
||||
| Dashboard | 3001 | `syslog-dashboard` | Observability UI + API |
|
||||
|
||||
## GPU Routing
|
||||
|
||||
| Header `X-Syslog-Model` | Backend | Model |
|
||||
|---|---|---|
|
||||
| (none) / `standard` | amdpve (.15) | qwen3.6-35B-A3B (MoE) |
|
||||
| `heavy` / `qwen3.5-27B` | llmgpu (.8) | qwen3.5-27B (Dense) |
|
||||
| `light` / `gemma-4` | ocu_llm (.110) | gemma-4-E4B (Light) |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Build & start
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
|
||||
# Verify
|
||||
curl http://localhost:8091/health
|
||||
curl http://localhost:3001/api/status
|
||||
```
|
||||
|
||||
## Dashboard
|
||||
|
||||
- **UI:** `http://<host>:8080/dashboard/harness.html`
|
||||
- **API:** `http://<host>:8080/dashboard/api/status`
|
||||
|
||||
## Circuit Breaker
|
||||
|
||||
- Rate limit: 10 req/s per IP
|
||||
- Burst: 20 requests
|
||||
- Excess returns 503
|
||||
- Queue fallback on GPU 502/503
|
||||
|
||||
## Production Migration
|
||||
|
||||
See [MIGRATION_PLAN.md](./MIGRATION_PLAN.md)
|
||||
|
||||
---
|
||||
*Built for Syslog Solution LLC — Quality over speed.*
|
||||
Reference in New Issue
Block a user