MSG-30 ·
llmmsg-srvVENUSINF: venus infrastructure migration coordination (Phase 0-3)
- Ref
MSG-30(#986)- Project
llmmsg-srv- Status
- done
- Priority
- high
- Type
- epic
- Assigned
- pm-llmmsgsrv-cc
- Created by
- wi-cli-whey
- Created
- 2026-06-13T07:16:10.892Z
- Updated
- 2026-06-15T08:08:46.122Z
- Closed
- 2026-06-15T08:08:46.122Z
Questions
No questions.
Event log
-
Venus infra migration coordination epic. Spec: /gdrive/projects/venus-migration/venus-migration-spec.md (v1, Phase 0 AUTHORIZED 2026-06-13). NON-NEGOTIABLE: NO-BREAK parallel-run (no big-bang); nothing past P0 without Elazar per-phase GO; llmmsg-srv STAYS SQLite (PG out of scope); lezama keeps all data local. PHASES: - P0 (AUTHORIZED, venus-only, reversible): 0a harden venus FIRST (ufw default-deny+80/443/ssh, fail2ban, ssh hardening) BEFORE binding public ports; 0b install Caddy+node+pg17 (drop Apache), -t test fqdns (hub-t/cdw-t/kpi-t/bwi-t.pensanta.com)+LE certs, run SNI probe from lezama+tablet. Owner nw-venus-cc, seq-lead nw-whey-cc. - P1 (needs GO): hub on venus (SQLite); cut whey/venus agents to poll it; verify. Owner pm-llmmsgsrv-cc. - P2 (needs GO): lezamasg-srv-t (separate hub instance ON lezama) + cdwl-t.pensanta.com->10.78.42.168; per-agent OLD/NEW toggle. Owners maintainer-movilba-cc-l + nw-lezama-cc (lezama side) + pm-llmmsgsrv-cc (hub/service code). - P3+ (needs GO): move kpi/telemetry; per-agent .agent-key auth rollout; suppress lezama-disconnect alerts; optional unify if SNI PASS; graduate -t->bare names. PM TECHNICAL REVIEW (flags raised to bin-whey/nw-whey before P2 locks): 1. PORT: lezamasg-srv-t = 9713 (NOT 9703 - localhost:9703 on lezama is occupied by the OLD ssh-R forward to whey hub during parallel-run; 9703 would collide and break NO-BREAK). 9713 free vs lezama in-use 3000/3042/3043/3100/3200/4000. 2. MOVILBA CROSS-HOST (gating P2): two-roster partition = no DM lezama<->whey/venus. Needs pm-mba-l-cc explicit sign-off that mba-l(lezama) agents do NOT need to reach whey/venus movilba siblings (e.g. proxy-mba-w-cc). Elazar not needing cross-talk != agents not needing it. Earlier 'single hub serves movilba' assessment was for the SINGLE-hub design, not this partition. 3. #589 HYGIENE: lezamasg-srv is a DELIBERATE partition (OK, not accidental split-brain) IFF the dead #589 island-b hub stays disabled AND no agent registers on both rosters. 4. ONE CODEBASE: lezamasg-srv = same hub.mjs run with different env (port 9713/DB path/site.conf), NOT a fork - avoids schema drift. 5. AUTH (spec #4) SOUND, endorsed: per-agent bearer key, hash stored hub-side (name->hashed-key+host+status), .agent-key mode 600, shim sends bearer, name<->key binding verified INSIDE hub.mjs (reject key-for-X claiming-Y). Start static bearer; JWT later. Note: each hub (main + lezamasg-srv) owns its OWN creds table; a toggled agent needs a key on whichever hub it targets.
-
P2 MOVILBA CROSS-HOST RESOLVED (pm-mba-l-cc confirmed; my pushback validated). mba-l(lezama) is NOT self-contained: lezama PM/coder/db depend on proxy-mba-w-cc(whey) as sole browser/playwright tester via the hub. Hard two-roster partition would kill that QA edge. SNI probe (Phase 0b) decides the branch: - SNI PASS -> mba-l lezama agents register on the venus hub DIRECTLY (unified, one roster incl. whey proxy). Unified venus hub is now the PREFERRED end-state; lezamasg-srv becomes unnecessary for movilba. - SNI FAIL -> lezamasg-srv forced AND a CROSS-PARTITION BRIDGE for the proxy<->PM DM/QA edge must be built = NEW work in pm-llmmsgsrv-cc lane (hub/MCP). CONDITIONAL P2 SUB-ITEM (triggers ONLY if SNI probe FAILS): design+build lezamasg-srv<->venus-hub bridge relaying the movilba QA DM edge (mba-l agents <-> proxy-mba-w-cc). No build now; probe result selects branch. Probe gated on Elazar's DNS + Hetzner-FW.
-
PHASE 1 AUTHORIZED by Elazar 2026-06-13 06:25 ('go'). pm-llmmsgsrv-cc = lead (hub + auth code lane). Phase 0 DONE+verified: venus Caddy:443 + 4 LE certs (hub-t/cdw-t/kpi-t/bwi-t -> 91.99.136.171), cloud-FW 'venus-edge', SNI gate PASSED from whey + lezama-inside-GCABA + tablet. PIVOT - SNI PASS resolves the lezama branch to the GOOD case: lezama reaches a unified venus hub on :443 directly. => lezamasg-srv is UNNECESSARY (dropped from scope), no two-roster partition, and the movilba mba-l<->proxy-mba-w-cc QA edge survives with no bridge. Supersedes the earlier conditional P2 sub-item (the FAIL branch never triggers). P1 SCOPE (NO-BREAK guardrails): OLD whey hub (:9703 ssh-R, sqlite) stays FULLY LIVE - parallel-run. venus hub = SEPARATE instance, 127.0.0.1:9703, OWN fresh sqlite (own trust domain). STOP after ONE test agent register+send+read+authenticate via hub-t. Real-agent migration + whey retirement need a FRESH Elazar GO. AUTH DECISION (lead): auth lands in P1 but ONLY as coarse EDGE gate - single shared static bearer required at Caddy hub-t (401 on missing/wrong, never reaches hub); test agent carries it in MCP shim. Full §4 per-agent keys + name<->key binding STAY P3 (only matters with multiple real agents). Optional env-gated hub-side bearer-presence check as defense-in-depth. Closes the internet-facing open-registration hole without bloating P1. BUILD SEQ: 1.[pm] venus hub instance (fresh sqlite, 127.0.0.1:9703, venus site.conf, verify boot/migrations). 2.[pm] mint shared bearer (mode-600, Elazar=root-of-trust). 3.[nw-venus,signal] systemd unit + Caddy placeholder->require-bearer+reverse_proxy localhost:9703 in one change. 4.[pm] test-agent shim -> hub-t + bearer header. 5.[pm] checkpoint: register/send/read works w/ bearer, 401 without; whey hub still live; report to bin-whey -> Elazar gate; STOP.
-
P1 BUILD (pm) shipped: hub LLMMSG_EDGE_BEARER gate 8a17b5a v2.9.37 (constant-time bearer check before route dispatch, no-op when unset - reviewed+approved by hub-llmmsgsrv-cc); MCP shim LLMMSG_HUB_BEARER 592191a v2.9.38 (client attaches Authorization: Bearer on every httpRequest when set). Shared bearer minted + staged off-channel for nw-venus. Full venus deploy handoff sent (pull->init-db fresh schema->venus site.conf->bind 127.0.0.1->EDGE_BEARER in unit->ATOMIC Caddy bearer-401+reverse_proxy localhost:9703->verify 9703 localhost-only). nw-venus deploying. CONSTRAINT CAPTURED for cdw-t migration (hub-llmmsgsrv-cc catch, NOT P1): a browser EventSource physically cannot set an Authorization header, so the edge-bearer model cannot gate chat-duo-web's browser SSE (/connect) leg. P1 is unaffected (hub-t + agent test only; agent shim is poll-based bearer-carrying httpRequest, no SSE). Resolution follows spec auth split: agents=machine bearer (shipped); chat-duo-web browser/human leg=cdw browser-LOGIN (cookie/session), NOT the machine bearer. OPEN DESIGN Q before cdw-t flips: browser->hub /connect DIRECT vs browser->chat-duo-web server.mjs(:9704)->hub server-side relay. If server-side relay, the node server holds the bearer and the browser never touches the hub (no EventSource problem). If browser-direct, need cookie/query-token auth for the SSE path. Owners when cdw-t GO'd: hub-llmmsgsrv-cc + coder-chatduo-cc.
-
P1 deploy hiccup + resolution: venus hub crash-looped because init-db.sh v1.0.0 creates messages.tag GENERATED while hub.mjs INSERTs tag. Resolved by option (b): skip init-db.sh, let hub.mjs self-create the schema (matches live whey exactly). Underlying init-db.sh drift filed as MSG-32 (not P1-blocking). nw-venus re-bringing-up the hub now; deploy otherwise complete (clone /opt/llmmsg-srv-venus v2.9.38, unit 600 root:root w/ bearer, bind 127.0.0.1). Caddy not yet flipped - nothing public; v1 shim + whey hub untouched.
-
P1 CHECKPOINT PASSED 2026-06-13 07:17 (pm, against https://hub-t.pensanta.com via bearer). Evidence: - NEGATIVE: no-bearer GET /online ->401, no-bearer POST /register ->401, bad-bearer ->401 (edge gate covers all routes). - POSITIVE w/ bearer: register p1-test-venus ->ok v2.9.38 island=main; send self-DM ->delivered; GET /unread?agent=p1-test-venus ->returns the message + seeded guide. - ISOLATION: /online roster = [llmmsg-srv-hub, p1-test-venus] only = fresh separate trust domain, no whey/lezama leakage. - NOT PUBLIC: venus public-IP 91.99.136.171:9703 ->000 unreachable (localhost-bind + cloud-FW); only hub-t:443 via Caddy works. - Caddy forwards Authorization (defense-in-depth: edge 401 + hub EDGE_BEARER 401). Secret lives only in unit (600 root:root) + Caddyfile; gdrive + whey-local copies trashed. - v1 /opt shim + whey hub untouched throughout (parallel-run intact). P1 scope COMPLETE. STOP per guardrails: real-agent migration + whey hub retirement need a FRESH Elazar GO (P2/P3). init-db.sh drift caught en route, worked around via option(b), filed MSG-32.
-
P2 GO logged by Elazar (relayed bin-whey-cc, 2026-06-14). Guardrails: parallel-run, NO-BREAK, whey hub stays fully live as rollback, canary-first. PM (hub-owner) mechanism call: FAST-COORDINATED-REPOINT, NO bridge (consistent w/ split-brain design; bridge=largest new failure surface + worse rollback). Repoint unit = 2 env families, both nw lane, ZERO team code edit: (1) shim ~/.claude.json LLMMSG_SRV_HOST/PORT+LLMMSG_HUB_BEARER; (2) bootstrap env LLMMSG_HUB_URL+LLMMSG_HUB_BEARER (MSG-34 already wired the .sh env-driven). Rollback=flip env back to whey, instant, no redeploy. BUMP-GATE: venus hub at 2.9.38 NOT canary-acceptable (silently still serves live /eq_* + missing MSG-35 poke fix; round-trip gate cant see it). First step = nw-venus git-pull venus hub to 2.9.42 (28945cc)+restart+verify [/health=2.9.42, GET /eq_*=410, bearer/edge/401/db GREEN]; hub-llmmsgsrv-cc standby only. THEN canary audit-venusmig-cc (net-new throwaway on venus, nw-venus counter-party, full edge-bearer round-trip+SSE drain). Seq: bump-green -> canary-green -> nw-venus->PM->bin-whey->Elazar go/no-go -> ARO-by-ARO coordinated repoint, PM+support LAST, whey live throughout. nw-whey/nw-venus holding, nothing spins until venus /health=2.9.42.
-
P2 GO (Elazar, via bin-whey 2026-06-14). Mechanism (PM/hub-owner): FAST-COORDINATED-REPOINT, NO bridge. Repoint unit = 2 env families, both nw lane, ZERO team code edit: (1) shim ~/.claude.json LLMMSG_SRV_HOST/PORT+LLMMSG_HUB_BEARER; (2) bootstrap env LLMMSG_HUB_URL+LLMMSG_HUB_BEARER (MSG-34 already wired the .sh env-driven, line8 HUB_URL/line13 HUB_BEARER). Rollback=flip env to whey, instant. BUMP-GATE: venus 2.9.38 NOT canary-acceptable - silently still serves live /eq_* + missing MSG-35 poke fix, round-trip gate cant see it. First step=nw-venus git-pull venus hub to 2.9.42 (28945cc)+restart+verify[/health=2.9.42, /eq_*=410, bearer/edge/401/db GREEN]; hub-llmmsgsrv-cc standby. THEN canary audit-venusmig-cc (net-new throwaway on venus, full edge-bearer round-trip+SSE). Seq: bump-green->canary-green->nw-venus->PM->bin-whey->Elazar go/no-go->ARO-by-ARO repoint PM+support LAST, whey live throughout.
-
P2 SAFE PREP LANDED (no agent moves): ccs.sh v5.19 (sh.git 64298a2) = inert per-cwd .llmmsg-env overlay sourced after load_site_conf/before hub-URL resolve w/ set -a (reaches bootstrap + exec'd claude/shim env); no-op when file absent; bash -n clean. scripts/llmmsg-env.template (bb7d0e3) documents both legs + env-merge dry-run + secret-bearer-via-DM + edit+relaunch rollback. Mechanism FINAL = Option A per-agent (ARO-by-ARO), overriding the B speed-rec because: inert ccs.sh line is zero-risk-default, enables a REAL-agent ARO canary (extends Elazar canary-first into the fleet), reusable for venus+lezama+future moves, no ~/.bashrc pollution. SHIM leg already per-agent via ~/.claude.json mcpServers.llmmsg-srv.env (nw-whey lane, pm-testagent proves the hub-t+bearer pattern); BOOTSTRAP leg closed by the overlay. Rollback HONEST = remove .llmmsg-env + revert .claude.json + RELAUNCH (per-agent brief downtime, not live flip); whey live as target throughout. Prereqs before window: each host git-pull sh.git to v5.19; first canary verifies CC parent-env-merge-into-MCP-child (decides if .claude.json shim edit needed fleet-wide). HOLDING for Elazar GO (bin-whey carrying) + canary-ARO pick.
-
★ REAL-AGENT CANARY GREEN (2026-06-14) — ENV-MERGE=YES (definitive). pm-ayudarg-cc cold-started on venus hub via .llmmsg-env overlay ALONE (no .claude.json edit). Both discriminators independently verified by nw-whey+nw-venus: DISC1 shim round-trip (nw-venus DMed it on venus, it received+replied 'shim-ok venus' via its own shim, drained back on venus = shim rides venus end-to-end); DISC2 fully ABSENT from whey /roster+/online (no split-brain). CONCLUSION: CC passes parent process env to the stdio MCP child, so inherited LLMMSG_HUB_URL (from .llmmsg-env, set -a in ccs.sh v5.19) repoints BOTH bootstrap AND shim. FLEET PER-AGENT STEP = one .llmmsg-env drop (URL+bearer) + relaunch; ZERO .claude.json edits; rollback = remove .llmmsg-env + relaunch -> whey. Secret hygiene = .llmmsg-env in .git/info/exclude per repo. aro:ayudarg = FIRST real ARO migrated (pm-ayudarg-cc stays on venus). cdw gate: Elazar chose (a) accept reduced cdw view during window (cdw->venus separate later phase, browser-SSE-bearer design). NEXT: first multi-agent wave nominated = aro:mars (3 agents, quietest: 0msg/30min, 50/2h vs venus140/pluto135), members-together, gated on bin-whey confirm.
-
P2 RECIPE HARDENING (2026-06-14, bin-whey halt was productive). Findings: (1) DNS - llmmsg-hub.pensanta.com resolves to 172.27.178.27 = WHEY's OWN ZeroTier IP, so the env-less bootstrap FQDN default = WHEY (NOT harmless). (2) Whey-ghost mechanism: bootstrap also runs as a CC SessionStart/UserPromptSubmit HOOK every turn; if CC sanitizes hook-subprocess env and drops the ccs-exported LLMMSG_HUB_URL, the hook leg falls to the FQDN default (whey) and RE-REGISTERS the migrated agent on whey every turn -> a frozen 'ghost' roster row. (3) Item B: a still-on-whey sender DMing that ghost -> whey buffers it -> agent (on venus) never drains -> SILENT message loss (worse than the accepted visible non-delivery). (4) NO deregister endpoint exists (stmtDeleteRosterRow only via pruneStale, prune-on-write after STALE_TTL_S=600). SOURCE FIX LANDED: bootstrap v1.18 (219b4ec) self-sources $cwd/.llmmsg-env (set -a) BEFORE HUB_URL resolve -> hub resolution self-sufficient regardless of CC hook-env handling; inert when no .llmmsg-env. HARDENED PER-AGENT RECIPE: map agent->HOST (repoint run by the agent's-host nw, in that host's repo copy); drop .llmmsg-env (+.git/info/exclude) BEFORE (re)launch (never after - env-less launch=whey); relaunch via ccs v5.19; verify on venus; scripted whey-DB dereg of exactly the wave's agents (belt-and-suspenders behind the source fix); verify absent from whey /online. PREREQ before mars: every host pulls sh.git (ccs v5.19) + /opt/llmmsg-srv (bootstrap v1.18). A-GATE still OPEN pending nw-whey empirical poke-test (fresh ghost = hook-env-sanitized confirmed -> verify v1.18 kills it -> then mars; no ghost = one-off). mars=venus-host -> nw-VENUS executes it as the hardened-recipe canary. Followup: add proper hub /unregister route. ayudarg whey orphan manually deleted (validated dereg step); pm-ayudarg-cc live on venus.
-
P2 mars wave - all pre-fire gates GREEN, audit-mars-cc canary FIRING. Gate1 (whey-toolchain no-ghost): PASS via pm-testagent v1.18 SessionStart, both-sided (whey-absent + venus-present), torn down. Gate2/venus: bootstrap v1.18 deployed into RUNNING venus tree (key finding: venus /opt/llmmsg-srv is hand-deployed, was v1.16; git checkout is separate /opt/llmmsg-srv-venus; nw-venus copied bootstrap byte-identical, hub.mjs UNtouched, hub 2.9.42 preserved, shim resolveHubUrl guard present mjs:21). Gate3: host map both sides + duplicate-home dedup (pm-ayudarg venus-dup quarantined; coder02/db02-mars tagged host=venus; mars-remote-elazar = defunct/Elazar-operated, left untouched pending Elazar). Membership: wave = 4 live -cc (pm/coder/db/audit-mars); coderhelp/researcher/02-dirs dormant, register nothing, strand nothing. CONFIG DECISION: UNIFORM hub-t.pensanta.com+bearer all hosts (loopback-bearer-free falsified - raw 127.0.0.1:9703 enforces bearer). Canary order: idle-check -> .llmmsg-env -> relaunch -> dereg whey row -> both-sided verify. On green: rest-of-mars + proxy-mars-cc-w (whey, edge+bearer) members-together. Coordination set (kpi-n-optimization) migrates DEAD LAST. Spun off MSG-38 (hub POST /unregister) to replace hand-SQL dereg.
-
mars canary audit-mars-cc FAIL #1 - upgraded shim-level verify caught a real false-green (canary-first paid off). Root cause (/proc-proven): venus mars agents hardcode env.LLMMSG_HUB_URL=whey in .mcp.json; an explicit .mcp.json MCP-server env key OVERRIDES inherited .llmmsg-env for the shim child. Bootstrap (bash, reads .llmmsg-env) registered venus; shim (CC-launched, .mcp.json wins) stayed on whey = split-brain (presence-ghost venus / live messaging whey). Bearer inherited fine (no .mcp.json pin on it). pm-ayudarg worked only because it has NO .mcp.json env block - so the prior env-merge=YES conclusion was valid ONLY for unpinned agents, NOT general. RECIPE CORRECTION (fleet-wide): new per-agent mechanism = (i) PRE-CHECK grep .mcp.json for hardcoded LLMMSG_HUB_URL (per-agent-variable: ayudarg unpinned, mars pinned); (ii) PINNED -> remove env.LLMMSG_HUB_URL key + drop .llmmsg-env + relaunch; UNPINNED -> .llmmsg-env alone; (iii) verify via /proc the shim child resolved to venus URL (presence insufficient). Do NOT write bearer into .mcp.json (git-tracked = leak). Likely affects pluto/venus teams (same agentteamlaunch origin). RULING: fix not rollback - nw-venus removes the .mcp.json URL pin on audit-mars-cc, cleans frozen venus ghost + live whey row, re-fires canary-only; green on all 5 legs = corrected mechanism proven before propagation. HALT on rest-of-mars + proxy-mars holds.
-
CANONICAL per-agent repoint recipe (3 config shapes proven): Shape1 ayudarg=truly unpinned (no .mcp.json env, no legacy) -> .llmmsg-env alone. Shape2 mars=explicit .mcp.json env.LLMMSG_HUB_URL=whey pin -> MUST remove pin. Shape3 proxy-mars + every whey agent w/o per-cwd block=inherits GLOBAL ~/.claude.json legacy LLMMSG_SRV_HOST/PORT=127.0.0.1:9703 (no HUB_URL) -> inherited .llmmsg-env HUB_URL should win by resolveHubUrl precedence (HUB_URL>legacy>default, BWI#517), no edit, but untested live. Shape3 is fleet-COMMON on whey, not edge. RECIPE: (i) PRECHECK characterize FULL url-resolution chain {explicit HUB_URL pin any layer | legacy HOST/PORT only | neither} - NOT bare HUB_URL grep (shape3 reads false-clean). (ii) remove any explicit HUB_URL pin (shape2 only). (iii) drop .llmmsg-env uniform hub-t+bearer. (iv) relaunch. (v) /proc-verify shim-child resolved URL=hub-t MANDATORY every shape (presence-alone falsified) + full a-e verify incl shim-DM-round-trip. Do NOT write bearer into .mcp.json (git-tracked=leak).
-
audit-mars-cc canary GREEN on re-fire (Shape-2 unpin path proven). /proc decisive: shim flipped whey->hub-t+bearer after .mcp.json LLMMSG_HUB_URL removal. Legs: venus-present-live, whey-absent, shim-DM-round-trip authed both ways, ARO-join N/A(audit role). Dereg nuance: NO /unregister on either hub yet -> canary whey row clean-BY-DECAY (TTL ~600s). For LIVE rest-of-mars that 600s frozen-row = silent-loss window (whey sender->whey buffer->agent-on-venus never drains), so propagation adds IMMEDIATE cross-host whey-side SQL dereg (nw-whey deletes whey roster row on nw-venus per-agent shim-moved signal; nw-venus can't, whey DB is whey-side). PROPAGATING rest-of-mars: nw-venus pm/coder/db-mars (Shape-2 unpin batch, members-together, +aro:mars-join verify) + nw-whey proxy-mars-cc-w (Shape-3, simple+/proc-verify, cold-launch). On full set green + whey rows cleared -> mars-complete.
-
MARS WAVE COMPLETE - first venus-toolchain repoint, 5 agents whey-hub->venus-hub, parallel-run, whey live as rollback. Agents: audit-mars-cc(canary), pm/coder/db-mars-cc (venus-host, nw-venus) + proxy-mars-cc-w (whey-host->venus-hub, nw-whey). DIRECT evidence all 5, every leg (no relay gaps): /proc shim-child=hub-t+bearer; venus-present-advancing; whey-ABSENT (nw-whey SQL, zero %mars% rows remain); shim-DM-round-trip authed both legs; aro:mars reconstituted on venus=[pm/coder/db-mars+proxy-mars-cc-w] (audit N/A by role). whey hub unchanged=rollback intact. Proven recipe at wave scale: precheck full url-resolution chain (3 shapes) -> unpin explicit HUB_URL if present -> drop .llmmsg-env (uniform hub-t+bearer) -> relaunch -> /proc-verify+5-leg -> immediate cross-host whey-row dereg. Side-items: bearer-capable shim deployed to venus running tree (canary caught drift to bearerless v2.3.0 pre-fire); MSG-39 monitor false-flag fixed+closed (v4.14); MSG-38 /unregister built (0013487, deploys dead-last). NEXT: pluto then venus waves w/ proven recipe; coordination-set (kpi-n-optimization) DEAD LAST; cdw separate phase. Awaiting bin-whey Elazar relay + pluto-wave go.
-
Pluto prep: 4th config shape found. SHAPE-4 = explicit LLMMSG_HUB_URL pin in GLOBAL ~/.claude.json mcpServers.llmmsg-srv.env (venus global = http://172.27.178.27:9703 whey-ZT; /proc-confirmed on pm-pluto). Overrides parent .llmmsg-env same as Shape-2 per-cwd pin (mars proof holds at global layer). Venus-fleet-common (mirror of Shape-3 whey-fleet-common legacy HOST/PORT). Pluto MIXED: db-pluto=Shape-2 (per-cwd .mcp.json pin), pm/coder/audit-pluto=Shape-4 (global pin). RULING=Option B (contained): per-agent per-cwd .mcp.json with NO LLMMSG_HUB_URL (db-pluto remove existing key; pm/coder/audit create shadow mirroring global server def, env w/o HUB_URL, no bearer in file->inherits .llmmsg-env). Global UNTOUCHED. NOT Option A (edit venus global) - A flips every venus global-inheriting agent at once incl nw-venus-cc + not-yet-migrated agents = violates one-window guardrail. /proc-verify mandatory. CANONICAL RECIPE precheck now enumerates explicit HUB_URL at ALL layers: per-cwd .mcp.json / per-cwd .claude.json / GLOBAL .claude.json (+legacy HOST/PORT) -> neutralize per-agent. FOOTGUN: venus global Shape-4 pin=whey means EVERY dormant venus agent relaunch goes to whey unless given per-cwd shadow+.llmmsg-env (subpm-pluto-cc dormant standing aro:pluto role = immediate case). Systemic fix = Option A at P3 cutover (set venus global HUB_URL->venus hub when all venus migrate permanently); track as P3 global-pin flip. Pluto .mcp.json writes = FIRE-STEP, hold for Elazar go. Pluto wave=pm/coder/db/audit-pluto-cc (venus) + proxy-pluto-cc-w (whey, Shape-3, dereg-at-cutover - LIVE whey row, kill-first ordering).
-
ELAZAR GO 2026-06-14 'all at once' (relayed bin-whey-cc-mqdta7cabofd). Authority gate released: pluto+venus in ONE window, no per-wave gating. Guardrails still bind: whey live rollback, per-agent 4-shape precheck->neutralize HUB_URL->.llmmsg-env hub-t+bearer->relaunch->/proc-verify=hub-t->5-leg->cross-host whey dereg (kill-old-first on live whey rows). Coordination set (pm/bin-whey/nw-whey/nw-venus/support) migrates DEAD LAST as tail. No green on relayed evidence. FIRED: dispatched nw-venus (8 venus-host: pluto pm/coder/db/audit + venus pm/coder/db/audit; venus team gets precheck-first shape classification) + nw-whey (proxy-pluto-cc-w Shape-3 kill-first + per-agent whey-row DELETE service on nw-venus shim-moved signals). Awaiting per-wave complete with direct /proc + direct SQL.
-
PLUTO WAVE COMPLETE 2026-06-14 - 5/5 GREEN direct evidence. 4 venus-host (pm/coder/db/audit-pluto): nw-venus raw /proc=hub-t+bearer all 4, roster present+advancing. Shapes confirmed: pm/coder/audit=Shape-4 (per-cwd .mcp.json shadow env{} no-bearer-in-file), db=Shape-2 (HUB_URL key dropped, supabase preserved). ARO: pm-pluto=evolutiva-management+pluto, coder/db=pluto, audit N/A. proxy-pluto-cc-w (whey-host): nw-whey /proc=hub-t+bearer, whey-row DELETE before=1/after=0. leg-d RULING: for venus-HUB agents, bearer-ENFORCING-hub register(POST)+poll(GET) both-succeeding = bidirectional bearer proof (bad bearer 401s both); structural satisfaction accepted, with same-hub pluto-as-counterparty upgrade offered for venus wave. ANOMALY logged: pm-venus/coder-venus had STALE non-advancing venus-hub rows (real shim still whey-ZT-pinned) - upsert-by-name relaunch reclaims fresh, NOT split-brain (single-homed), benign; silent-loss window closed by prioritizing them early in venus wave; origin under investigation. Proceeding venus wave.
-
VENUS WAVE COMPLETE 2026-06-14 - PLUTO+VENUS both done, 10 agents migrated + whey-clean. venus 4 (pm/coder/db/audit-venus): nw-venus raw /proc=hub-t+bearer all 4, present+advancing; shapes pm/coder/audit=Shape-4 (per-cwd shadow env{}), db=Shape-2. nw-whey dereg: 10 whey rows DELETE before=1/after=0 total (2 proxies + 4 pluto + 4 venus); zero %pluto%, %venus% down to cc-context-monitor-venus (script, OOS) + nw-venus-cc (coord-set). HYGIENE FINDING: 2 Shape-4 agents (pm/coder-venus) showed transient phantom venus-hub rows - bootstrap register-leg fired post-.llmmsg-env-write/pre-relaunch (hook/reconcile), registering NEW hub while live shim still whey-pinned = canonical false-green class; /proc-verify gate CAUGHT it (validates gate as load-bearing); single-homed not split-brain, reclaimed on relaunch. Follow-up WI to harden bootstrap register-leg ordering. STRAGGLERS surfaced (bin-whey): coder-apiimages-cc (Elazar LIVE session - hard hold) + proxy-mba-w-cc (movilba proxy - safe anytime). FINAL PLAN: stragglers + coord-set tail (support->bin-whey->pm->nw-venus->nw-whey, nw-whey self-deletes own row host-local = no frozen row, no /unregister needed). Whey hub stays LIVE rollback. Holding for Elazar straggler+tail go via bin-whey.
-
ELAZAR FULL GO 'migrate all, dont wait for me' 2026-06-14. Stragglers: proxy-mba-w-cc GREEN (nw-whey, /proc=hub-t+bearer, row deleted, ARO mba-l+mba2-l). coder-apiimages-cc = NOT a whey straggler (nw-whey caught: zero whey process but live-advancing whey row host=venus = venus-resident polling whey via global pin) -> reassigned to nw-venus lane (migrating now). CDW GATE (hub-owner ruling, blocks coord-set tail): Elazar's interface = chat-duo-web.service (server.mjs on whey :9704) which polls the hub server-side via legacy LLMMSG_SRV_HOST/PORT, NO bearer. hub-t.pensanta.com is the API (bearer-gated, 401 on browser = correct), NOT a chat UI - corrected bin-whey/Elazar misconception (no new URL to open). FIX (coder-chatduo, ~15min): inject Authorization:Bearer in hubRequest + repoint cdw server to venus over ZT-raw + systemd env update + restart; Elazar keeps SAME :9704, server moves behind it. Browser never carries bearer (server does). TOPOLOGY: cdw SERVER flip != coder-chatduo-AGENT migration (agent stays whey-bound) -> coder-chatduo OBJECTIVELY curl-verifies venus-side (elazar-the-user-human present+advancing on venus + test DM round-trip) and reports to whey-me = gate-clear, NO human cross-hub round-trip needed. Then tail fires autonomously: coder-chatduo->hub->support->nw-venus->bin-whey->pm->nw-whey-last->TAIL COMPLETE on venus, fresh bin-whey resumes Elazar-relay on venus. nw-whey tail terminal-handoff = OPTION A pre-authorized (self-migrate last gated on pm /proc-verify PASS, no cross-hub go). Whey hub stays LIVE rollback. Holding for cdw-verify + coder-apiimages green.
-
CDW GATE CLEARED 2026-06-14 - Elazar interface live on venus, coord-set tail FIRING. coder-chatduo cdw v0.7.8 (committed 16d5c07): reworked hubRequest for https+URL (ZT-raw venus:9703 DEAD - hub loopback-only, edge :443 sole off-host path), bearer injection, unit-env bearer (not git). OBJECTIVE 3-LEG VERIFY (coder-chatduo whey-bound + hub-llmmsgsrv corroborated): (a) elazar-the-user-human present on venus /roster 200, (b) GET /unread bearer 200 cursor-advancing, (c) /send human->pm-venus-cc ok=true delivery_count=1. cdw /api serves venus roster (20 agents). Blue-green revert = drop 2 Environment lines from chat-duo-web.service + restart -> whey fallback. NOTABLE: venus hub binds 127.0.0.1 loopback-ONLY (NOT 0.0.0.0 like whey) = edge-only ingress via bearer Caddy; diverges from project CLAUDE.md 'binds 0.0.0.0' (whey-written) -> doc-reconcile follow-up, ties #596. TAIL ORDER GIVEN to nw-whey (drives, absolute-last, option-A pre-authorized self-migrate): coder-chatduo->hub-llmmsgsrv->support->nw-venus(self)->bin-whey->pm->nw-whey-last->TAIL COMPLETE on venus. Autonomous (Elazar 'dont wait'). Whey hub stays LIVE rollback.
-
VENUSINF P2 MIGRATION COMPLETE 2026-06-14. TAIL COMPLETE (nw-whey, posted from venus): all 7 coord-set agents migrated whey->venus, zero /proc-verify failures, fail-safe never tripped. Order executed: 1 coder-chatduo 2 hub-llmmsgsrv 3 support 4 nw-venus(self-migrate, dereg-via-stale-whey-curl) 5 bin-whey 6 pm-llmmsgsrv 7 nw-whey(last, own /proc=hub-t + venus present+advancing). PM own venus-side roster verify: 26 agents present+advancing on venus = mars(5)+pluto(5)+venus(5)+coder-apiimages+proxy-mba-w+7 coord-set+pm-ayudarg+elazar-the-user-human(interface on venus)+llmmsg-srv-hub. Whey fleet-coordination plane EMPTY. Whey hub stays LIVE as rollback. Sent Elazar completion summary. OUT-OF-WINDOW (separate gos): lezama mba-l team (5, host=lezama) stays whey; P3 global-pin flip + per-agent identity keys. POST-MIGRATION CLEANUP: (1) aro:kpi-n-optimization fan-out OFF on fresh venus sqlite - needs aro_config + MSG-17 allowlist for coordination AROs; (2) #596 venus bind-posture doc note (loopback-only + edge ingress); (3) bootstrap register-leg ordering harden (phantom-row class /proc-verify caught). Total this op: mars(5, earlier) + 26 this window = full real-agent fleet on venus.
-
POST-MIGRATION: #38 venus fan-out FIXED+GREEN - nw-venus seeded 12-row aro_fanout_allow (whey's 9 + project rooms mars/pluto/venus, INTENTIONAL divergence: those teams now operate on venus so their room fan-out is enabled there; whey denies them only because the teams left) + aro_config(kpi-n-optimization->bin-whey-cc, matches whey live config). Real fan-out test green (6 recipients incl elazar). Elazar can post the room.
-
Pluto shared-checkout cleanup: migration left .gitignore (+.llmmsg-env ignore) + 4 agent .mcp.json (env-block to {}, no inline secret, verified by pm-pluto) staged in pluto app repo ahead of urgent api-images redeploy. Greenlit coder-pluto to sweep+commit all 5 as-is under redeploy SHA (only coder/coderhelp may push pluto repo; maintainer-separate commit not possible). Gate: .llmmsg-env must be effectively gitignored so add -A cannot grab the bearer. Maintainer surface=pm-llmmsgsrv.
-
lezama leg interim-safe: nw-lezama on literal-intact+bearer hand-patch (relaunch-safe both probe+auth axes). Pending: coordinated full-tree pull-forward (hub-llmmsgsrv pre-writing paste-ready cmd). Shim canonical has bearer since v2.9.38; v2.9.47 made package.json the version SSOT (hub+shim+#588 probe). MCP-shim bearer gap root cause: lezama /opt was week-stale pre-v2.9.38 checkout, not missing feature.
-
Wave-1 (monitor repoint), Wave-2 (lezama 5-agent cohort), Wave-3 (legacy :9703 decommission) all complete. Shim canonical deployed (0a51c6f). Legacy service+bridge+watchdog disabled on whey. Ghost purge clean. Scrp- rename chain is a separate follow-on, not part of this epic.