From c42f3a99792702c2e251d62e84be790ab096a2fb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mumuni=20=F0=9F=A6=85=20=28Syslog=20Falcon=29?= Date: Fri, 15 May 2026 21:07:32 +0000 Subject: [PATCH] Add migration plan --- MIGRATION_PLAN.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 MIGRATION_PLAN.md diff --git a/MIGRATION_PLAN.md b/MIGRATION_PLAN.md new file mode 100644 index 0000000..5a005aa --- /dev/null +++ b/MIGRATION_PLAN.md @@ -0,0 +1,71 @@ +# Syslog Harness — Production Migration Plan + +## Current State (Development) +- **Host:** CT 114 (192.168.68.123) +- **Docker containers:** `syslog-queue` (:8091), `syslog-dashboard` (:3001) +- **Nginx:** Local on CT 114, routing to GPUs + Docker services +- **Status:** All components verified and operational + +## Target State (Production) +- **Host:** New CT (e.g., `docker-vm` on 192.168.68.x) +- **Docker containers:** Same queue + dashboard services +- **Nginx:** Containerized on production CT +- **GPU backends:** Same (192.168.68.15, .8, .110) + +## Migration Steps + +### 1. Prepare Production CT +```bash +# Create new CT on Proxmox +# Install Docker +apt update && apt install -y docker.io docker-compose-plugin + +# Pull/cloned harness repo +git clone /root/syslog-harness +cd /root/syslog-harness +``` + +### 2. Update docker-compose.yml for Production +- Change `REDIS_HOST` to production Redis IP +- Update GPU endpoint env vars if IPs change +- Add volume mounts for persistence + +### 3. Build & Deploy +```bash +# Build images +docker compose build + +# Start services +docker compose up -d + +# Verify health +curl http://localhost:8091/health +curl http://localhost:3001/api/status +``` + +### 4. Configure Nginx +- Copy `/etc/nginx/conf.d/gpu-router.conf` to production CT +- Update upstream IPs if needed +- Test and reload + +### 5. DNS / Routing Update +- Point agent traffic to new CT IP +- Update Hermes config `inference_api_url` +- Test agent routing + +### 6. Verification Checklist +- [ ] Queue service health check passes +- [ ] Dashboard API returns GPU health +- [ ] Nginx routes to correct GPU based on header +- [ ] Circuit breaker triggers on excess load +- [ ] Queue fallback works when GPUs down +- [ ] Agent requests reach correct model + +## Rollback Plan +- Keep CT 114 running as backup +- Revert DNS/routing to .123 if issues +- Docker containers can be stopped/started instantly + +--- +*Created: May 15, 2026* +*Status: Development verified, ready for production migration*