MARS-145 ·
marsMars DB scaling: migrate serverless runtime from direct-5432 to Supavisor transaction pooler (6543); reserve direct-5432 for weekly-kpi cron + migrations/seed; cut runtime pg.Pool max 16->~3; audit+cap per-request concurrent-query fan-outs (mirror Pluto mapWithConcurrency); pre-flight named-prepared-statement check (txn mode drops them). 3-PM consensus bs-mql6kqmrm6j, Venus=reference (already on pooler). One-env-var + cron carve-out, reversible.
- Ref
MARS-145(#1152)- Project
mars- Status
- backlog
- Priority
- high
- Type
- task
- Assigned
- db-mars-cc db
- Created by
- wi-cli-venus
- Created
- 2026-06-19T17:10:33.091Z
- Updated
- 2026-06-19T17:10:33.091Z
Questions
No questions.
Event log
-
DECISION refinements (bs-mql6kqmrm6j amend, pm-venus coordinator): (1) Do NOT flat-cut runtime pool max to 3 -> FIRST verify Mars's Vercel compute mode: Fluid Compute (many concurrent req/instance -> higher max justified) vs Traditional (1 req/container -> small max). Size max to per-instance concurrency, not a magic number. Mars max=16 was set for weekly-kpi CONCURRENCY=16, which belongs on the direct-5432 CRON path, not the serverless runtime pool. (2) B-durable leg: collapse wide per-request query fan-outs into set-based SQL/RPC/views (push-compute-to-Postgres, commons rule) -> fixes pool pressure at source, runtime-independent; JS concurrency-limiter is only the stopgap. (3) C leg: Supabase paid-tier upgrade trigger before scale (free tier caps conns+egress). Pooler caveat: no named prepared statements (node-pg default fine). nw-venus pulling tier client-conn cap to size max precisely. Pre-req before flipping: confirm runtime paths have no named prepared statements / session-level features.
-
Pre-flight (db-mars, read-only) — CODE IS CLEAN for transaction-mode migration: all session-aware SQL is transaction-scoped, none session-scoped. (a) set_config('app.user_id'/'app.archive_reason',$1,true) = is_local TX GUC, reset at COMMIT, all inside withTransaction() [archive-context.ts, mi-cursada/editar, alumnos, administrar-actions]. (b) pg_advisory_XACT_lock = auto-released at tx end [solicitar-acceso, mi-cursada/editar, practicas/nueva]. (c) current_setting(...,true) reads same-tx GUC. ZERO named prepared statements / LISTEN / NOTIFY / temp tables / session advisory locks in src/. Pooler string: aws-0-sa-east-1.pooler.supabase.com:6543, user postgres.ustenjufophwhlkzfdso. Weekly-kpi carve-out feasible: DATABASE_URL_DIRECT already exists (.env.local; CONFIRM set in Vercel prod env) -> weekly-kpi.ts builds its own Pool(max16) on DIRECT, db.ts runtime pool drops to compute-mode-sized max. Other crons (error/support/ux digests) are linear, no concurrency concern. REMAINING BLOCKERS (dashboard/REST reads): (1) Vercel Fluid Compute ON/OFF - not in MCP get_project or vercel.json, sizes runtime max; (2) Supavisor default_pool_size - dashboard/mgmt-REST, nw-venus tier-cap pull covers.
-
Sizing method (venus/coder-venus, bs sizing fact): govern on Postgres max_connections (Mars 57 usable), NOT Supavisor client-cap (needs mgmt token nobody holds). Arithmetic: safe_concurrent_instances = usable_conns / per_instance_runtime_max. Mars at max=16 -> ~3.5 instances before wall (fine today at 13/57). Post-migration: carve cron-16 to own direct pool, runtime max 3-5 -> ~11-19 safe instances. Sizable without mgmt API. Only leg-C (tier-upgrade trigger) needs Elazar's Supabase token for plan-tier + paid ceiling.
-
Pre-flight COMPLETE (db-mars). Fluid Compute = OFF/Traditional (Vercel REST v10: resourceConfig.fluid=false, elasticConcurrencyEnabled=false) -> 1 req/container -> runtime pool target max:2-3 (1 active + 1-2 cold-start/drain overlap). DATABASE_URL_DIRECT NOT in Vercel prod (only DATABASE_URL; exists only in .env.local) -> HARD PREREQ. Note: prod DATABASE_URL is type 'sensitive' vs dev/preview 'encrypted' - normalize at flip. STAGED SEQUENCE: (A) add DATABASE_URL_DIRECT to Vercel prod = current DATABASE_URL value, port 5432 [dispatched, reversible prep]. (B) code: weekly-kpi.ts builds own Pool(DATABASE_URL_DIRECT, max16) for the CONCURRENCY=16 loop + db.ts runtime pool max 16->3; lands SAFELY on direct-5432 first (reduces conn pressure immediately), audit pre-review (email path) + PTD. (C) low-traffic-window flip: DATABASE_URL -> aws-0-sa-east-1.pooler.supabase.com:6543 transaction-pooler, audit watching, reversible one-var revert. All reversible; no fire today (13/57).
-
CAVEAT (pluto-pm catch, applies to Mars): 'Mars is on direct-5432' is INFERRED from .env.local + db.ts comment, NOT the prod DATABASE_URL value (type=sensitive/encrypted, unreadable agent-side). Per local-env-not-prod-proxy rule, .env.local is dev-scope = not proof of prod. So leg (A) now = VERIFY prod DATABASE_URL host FIRST (db-mars probing pg_stat_activity client_addr/application_name; if inconclusive -> Elazar/Vercel-dashboard read). If prod is ALREADY pooled, (A) is moot and only (B) runtime-max-3 + cron carve-out + query-shape legs remain. Do NOT execute any flip until the prod host is confirmed. Also pause the DATABASE_URL_DIRECT prod-env add until verdict (if prod is already the pooler, DIRECT must equal the real direct host, which we'd then need to source explicitly).
-
Leg-B diff review (audit): a/b CLEAR (6 heavy kpi SELECTs isolated to DATABASE_URL_DIRECT pool, lazy global-guarded singleton + error-listener mirror db.ts). d ACCEPT (add console.warn on DIRECT-unset, fail-loud). c BLOCKED: runtime max:3 too low — report-export.ts /informes fan-out=6, practicas/[id]=6, ~12 loaders 4-5; max:3 serializes heavy pages. FIX: floor max:6, final number = db-mars call vs ceiling (max×warm-containers + 16 kpi-pool-during-cron vs ~45 usable direct-5432). db-mars looped to set it. The 'request needs only 1+overlap' reasoning was wrong: intra-request Promise.all is the real concurrency. LEG-A CHECKLIST ADD: confirm DATABASE_URL_DIRECT set in prod (done: present, encrypted) BEFORE flipping DATABASE_URL->pooler, else kpi pool's fallback routes the 16-wide batch onto the pooler = storm; coder's console.warn is the breadcrumb, this checklist item is the guard.
-
Runtime pool max FINAL = 8 (db-mars call). Covers worst-case single-request fan-out: report-export 6-wide (confirmed flat, no nested Promise.all) + layout's 3 concurrent (~10-20ms) without serialization. Ceiling: 8×5 containers=40 normal + kpi 16 off-peak (06:00 ART, <=3 containers) < 45 usable direct-5432. Coder applying.
-
LEG-B SHIPPED + PASS: SHA ba42daa v2.16.69. Audit PTD PASS:ba42daa — v2.16.69 live, deploymentId==serving alias (no frozen-prod), 15m runtime window zero error/warn/fatal; /practicas/[id] 6-wide fan-out served 200 on new runtime max:8 (empirically clean). Net: weekly-kpi cron owns dedicated direct-5432 pool (max:16, lazy global-guarded singleton + error-listener + FAIL-LOUD throw on missing DATABASE_URL_DIRECT); runtime shared pool right-sized 16->8. Touches only pool sizes + cron URL, NOT DATABASE_URL routing — safe regardless of leg-A verdict. FOLLOW-UPS: (1) KPI-pool fail-loud path structurally verified but NOT yet exercised in prod — first real fire = weekly-kpi cron Mon 06:00 ART; audit holds runtime-watch on first fire. (2) leg-A guard now STRUCTURAL (code hard-fails cron loud if DATABASE_URL_DIRECT missing) — still wants human eyeball at flip time. WI stays OPEN for leg A (pooler flip, gated on Elazar prod-host read) + leg C (tier trigger, gated on Supabase token).