Add migration plan

This commit is contained in:
2026-05-15 21:07:32 +00:00
parent e1f12c3462
commit c42f3a9979
+71
View File
@@ -0,0 +1,71 @@
# Syslog Harness — Production Migration Plan
## Current State (Development)
- **Host:** CT 114 (192.168.68.123)
- **Docker containers:** `syslog-queue` (:8091), `syslog-dashboard` (:3001)
- **Nginx:** Local on CT 114, routing to GPUs + Docker services
- **Status:** All components verified and operational
## Target State (Production)
- **Host:** New CT (e.g., `docker-vm` on 192.168.68.x)
- **Docker containers:** Same queue + dashboard services
- **Nginx:** Containerized on production CT
- **GPU backends:** Same (192.168.68.15, .8, .110)
## Migration Steps
### 1. Prepare Production CT
```bash
# Create new CT on Proxmox
# Install Docker
apt update && apt install -y docker.io docker-compose-plugin
# Pull/cloned harness repo
git clone <repo-url> /root/syslog-harness
cd /root/syslog-harness
```
### 2. Update docker-compose.yml for Production
- Change `REDIS_HOST` to production Redis IP
- Update GPU endpoint env vars if IPs change
- Add volume mounts for persistence
### 3. Build & Deploy
```bash
# Build images
docker compose build
# Start services
docker compose up -d
# Verify health
curl http://localhost:8091/health
curl http://localhost:3001/api/status
```
### 4. Configure Nginx
- Copy `/etc/nginx/conf.d/gpu-router.conf` to production CT
- Update upstream IPs if needed
- Test and reload
### 5. DNS / Routing Update
- Point agent traffic to new CT IP
- Update Hermes config `inference_api_url`
- Test agent routing
### 6. Verification Checklist
- [ ] Queue service health check passes
- [ ] Dashboard API returns GPU health
- [ ] Nginx routes to correct GPU based on header
- [ ] Circuit breaker triggers on excess load
- [ ] Queue fallback works when GPUs down
- [ ] Agent requests reach correct model
## Rollback Plan
- Keep CT 114 running as backup
- Revert DNS/routing to .123 if issues
- Docker containers can be stopped/started instantly
---
*Created: May 15, 2026*
*Status: Development verified, ready for production migration*