Files

T

SyslogBot b09a93f45c feat: Smart Queue Consumer implementation draft + architecture review

- SMART_QUEUE_IMPLEMENTATION.md: Complete implementation draft (1572 lines)
  with 10 quick-win fixes and full smart queue consumer rewrite
- ARCHITECTURE_REVIEW.md: 26-issue audit with prioritized findings
- Verified all 3 GPUs live: amdpve (73% util), llmgpu (idle), ocu_llm (idle)
- Redis 7.4.9 confirmed streams support
- GPU sidecar metrics verified on all hosts

Key fixes:
- QW-1: Dockerfile path mismatch (Dockerfile.queue -> queue-service/Dockerfile)
- QW-2: Nginx fallback only on ALL-GPU failure (not single GPU)
- QW-3: Container names fixed to Docker service names
- QW-4: Redis host default fixed (192.168.68.7 -> redis)
- QW-5: Dependency version pinning
- QW-7-10: Health checks, restart policy, Gunicorn, single-process collector

Smart queue features:
- Redis Streams + consumer groups
- GPU-aware load balancing via sidecar metrics
- Per-GPU circuit breakers with half-open recovery
- Adaptive backpressure (0-30 normal, 30-40 warn, 40-50 503, >50 open)
- Dead letter queue with retry endpoint
- Job ID tracking and /status/<job_id> API

2026-05-17 03:55:20 +00:00

1.9 KiB

Raw Blame History

Syslog Harness — Production Migration Plan

Current State (Development)

Host: CT 114 (192.168.68.123)
Docker containers: syslog-queue (:8091), syslog-dashboard (:3001)
Nginx: Local on CT 114, routing to GPUs + Docker services
Status: All components verified and operational

Target State (Production)

Host: New CT (e.g., docker-vm on 192.168.68.x)
Docker containers: Same queue + dashboard services
Nginx: Containerized on production CT
GPU backends: Same (192.168.68.15, .8, .110)

Migration Steps

1. Prepare Production CT

# Create new CT on Proxmox
# Install Docker
apt update && apt install -y docker.io docker-compose-plugin

# Pull/cloned harness repo
git clone <repo-url> /root/syslog-harness
cd /root/syslog-harness

2. Update docker-compose.yml for Production

Change REDIS_HOST to production Redis IP
Update GPU endpoint env vars if IPs change
Add volume mounts for persistence

3. Build & Deploy

# Build images
docker compose build

# Start services
docker compose up -d

# Verify health
curl http://localhost:8091/health
curl http://localhost:3001/api/status

4. Configure Nginx

Copy /etc/nginx/conf.d/gpu-router.conf to production CT
Update upstream IPs if needed
Test and reload

5. DNS / Routing Update

Point agent traffic to new CT IP
Update Hermes config inference_api_url
Test agent routing

6. Verification Checklist

Queue service health check passes
Dashboard API returns GPU health
Nginx routes to correct GPU based on header
Circuit breaker triggers on excess load
Queue fallback works when GPUs down
Agent requests reach correct model

Rollback Plan

Keep CT 114 running as backup
Revert DNS/routing to .123 if issues
Docker containers can be stopped/started instantly

Created: May 15, 2026 Status: Development verified, ready for production migration

1.9 KiB Raw Blame History

Syslog Harness — Production Migration Plan

Current State (Development)

Target State (Production)

Migration Steps

1. Prepare Production CT

2. Update docker-compose.yml for Production

3. Build & Deploy

4. Configure Nginx

5. DNS / Routing Update

6. Verification Checklist

Rollback Plan

1.9 KiB

Raw Blame History