MARS-144 ·
marsPráctica create: duplicate-on-retry when committed POST loses its response
- Ref
MARS-144(#1147)- Project
mars- Status
- done
- Priority
- someday
- Type
- bug
- Assigned
- — coder
- Created by
- wi-cli-venus
- Created
- 2026-06-17T01:43:09.755Z
- Updated
- 2026-06-17T05:38:37.425Z
- Closed
- 2026-06-17T05:38:37.425Z
Questions
No questions.
Event log
-
Pre-existing idempotency gap (low-pri/someday), surfaced during the 2026-06-17 'Failed to fetch' @/practicas/nueva triage. SCOPE: NOT this turn's work; createPractica is already atomic (withTransaction + orphan api-image cleanup), so no PARTIAL práctica is possible. The gap is the lost-RESPONSE variant: if the server-action POST commits the práctica but the response never returns to the client (connection dropped post-commit), the user sees a failure and retries -> the retry mints a FRESH practicaDailySeq (no content dedup on resubmit) -> a SECOND práctica row for one real event. Same '2 rows for 1 event' family as the reciprocal-práctica duplicate (reference_reciprocal_practica_duplicate). OBSERVED INCIDENCE = 0 so far: all 8 'Failed to fetch' rows to date have serverEventId=null (server never received the POST = Case A, no commit, clean retry) — the duplicate risk is Case B (committed-but-lost-response), which is possible but unobserved. FIX DIRECTION (when prioritized): client-supplied idempotency key on submit + server-side dedup window, or surface a 'maybe-saved, check before resubmitting' state instead of a blind retry. Filed at PM request (pm-mars-cc 2026-06-16-22:43) to avoid a silent drop.
-
Observed in the wild 2026-06-17 (db-mars cross-check). appEvents row bf602c3a [warn] 'Load failed' @/practicas/nueva, serverEventId=null, actor 26026e59. Práctica 09a68f39 COMMITTED at 04:58:09.579Z (0.4s after the client abort at 04:58:09.189Z), then deletedAt 04:58:41Z (~32s later). A prior clean submit f9141f56 existed at 04:53:07Z. Confirms the lost-RESPONSE variant: client got the network abort before the response arrived, the server had already committed, serverEventId=null did NOT mean no commit. The duplicate was cleaned (manual/dedup) this instance but relied on detection after the fact - incidence no longer 0. Fix direction unchanged: idempotency key + server-side dedup window so the 2nd identical submit is a no-op rather than a committed-then-cleaned row.
-
Priority bump someday(4)->normal(2) per pm-mars-cc 2026-06-17-02:16 (confirmed wild data-integrity edge, no longer theoretical). NOTE: wi CLI has no priority-set subcommand (priority is fixed at 'add'); recording the bump decision here as the authoritative record. Status now design-first: idempotency-token fix, design report pending PM+audit sign-off before any code.
-
DESIGN APPROVED by pm-mars-cc 2026-06-17-02:19 (audit design-ack pending). Approach: per-form-mount idempotency UUID, NOT content-hash, NO time window (permanent uniqueness — a window would re-open the gap on expiry). 3-layer server: (1) fast-path SELECT-by-token before image upload→return existing id; (2) in-tx recheck after the existing pg_advisory_xact_lock(studentId:mintDate)→DUPLICATE_SUBMIT code mapping to {status:success, existing id}, existing orphan-cleanup handles the wasted upload; (3) partial UNIQUE index backstop. Schema (db-mars lane): practicas ADD COLUMN idempotencyToken uuid NULL + partial unique idx WHERE NOT NULL, zero backfill, graceful null-token degradation for old cached form bundles. Client: mint via useRef(uuid).current at mount + RFC4122 fallback for Safari <15.4/non-secure-context (the observed browser was Safari — randomUUID could be absent, fallback mandatory). DOCUMENTED BOUNDARY (PM ruling: disclose, don't engineer around): a full page-RELOAD before retry remounts→new token→that path is NOT deduped; acceptable because a reload wipes all form state (teeth/images/patient) = deliberate re-entry, and is NOT the observed in-place-retry mechanism (row 09a68f39). Sequencing: db-mars lands column+index+schema.md re-export first, then coder client+server PR. No code until audit design-ack.
-
FINAL IMPL SPEC (audit design-ack approved-with-conditions, pm confirmed 02:22). All 4 REQUIRED gates + back-nav verify folded in: G1 (skip notify on dedup): client abort != server-promise cancel, so the ORIGINAL invocation already ran Stage-4 notifyPracticaCreated (JTP email + in-app + ayudante). BOTH dedup returns (fast-path early-return + DUPLICATE_SUBMIT map) return {status:success} WITHOUT calling notify. Side-effect fires exactly once. G2 (23505 constraint-NAME scoped): only err.constraint==='uq_practicas_idempotency_token' maps to dedup (then SELECT id WHERE idempotencyToken= for the id to return). Every other 23505 (uq_practicas_student_date_seq, HC) keeps the existing normalizePgErrorMessage error path. NEVER blanket-map (a real folio/HC collision mapped to fake-success w/ wrong practicaId is worse than a reject). G3 (token regen on logical identity, not just mount): useEffect keyed on ctx.resolved.studentUserId+comisionId regenerates the token ref. Closes the teacher student-switch silent-drop (submit X → abort-commit → switch picker to Y on same mount → resubmit would reuse X's token → Y dropped). G4 (DUPLICATE_SUBMIT carries existingPracticaId): in-tx recheck post-advisory-lock THROWS PracticaSubmissionError(DUPLICATE_SUBMIT, existingPracticaId) → existing catch runs orphan-image cleanup for the wasted upload → createPractica catch maps to {success, existingPracticaId} + skip notify. Fast-path hit returns early from persist BEFORE upload (no cleanup). Q1: in-tx recheck PRIMARY + partial unique index STRUCTURAL backstop (do both). Entropy ladder: crypto.randomUUID → crypto.getRandomValues v4 → Math.random last resort only. BACK-NAV: extra regen trigger — useEffect on state.status→'success' regenerates token, so a preserved-mount back-nav resubmit can't reuse a spent token (not relying on remount alone). POST-DEPLOY VERIFY (prod-only, no local dev): @demo.mars submit A → browser-back → fill+submit B → expect 2 rows w/ distinct idempotencyToken; then soft-delete demo rows. SEQUENCING: db-mars lands column(uuid NULL)+partial unique idx 'uq_practicas_idempotency_token' WHERE NOT NULL + schema.md/reference-data re-export FIRST → coder client+server PR → audit pre-push diff review → push → audit PTD. No push until audit clears the diff. Files: src/app/(protected)/practicas/nueva/{practica-form.tsx,actions.ts}.
-
inProgress
-
PUSHED SHA 5db2876 v2.16.67 (2026-06-17). Audit cleared diff pre-push (all 4 gates + migration 009 verified). Audit on PTD. Pending: PASS:5db2876 + post-deploy @demo.mars A→back→B verify then demo-row soft-delete.
-
CODE-COMPLETE + DEPLOY-VERIFIED (SHA 5db2876 / v2.16.67). GREEN correctness gates: audit diff pass (R-1 notify-skip / R-2 constraint-name 23505 / R-4 existingPracticaId+cleanup), migration 009 (exact index name + 29/29 archive-trigger alignment), clean Vercel deploy + frozen-prod check (deploymentId==serving alias), structural back-nav guard (success-regen + identity-regen, design-correct, no live test needed). NOT runtime-proven (untested != verified): live A→back→B (2 distinct rows/tokens) + same-token-collapse dedup — chrome-devtools-mcp NOT reachable from venus, so the authed prod browser verify can't run from this host. VERIFY PATH (audit+coder converged, option a): monitored soft-launch — db-mars spot-checks idempotencyToken column on the next ORGANIC prod práctica pair (non-null + distinct), audit holds dpl_6vESa7v runtime watch for (1) createPractica/dedup 500s, (2) double-notify (2 review-notify appEvents per practicaId = R-1 failure). FINISHED decision = PM (now on green gates, or after first organic spot-check).
-
idempotency-token guard prevents lost-response duplicate prácticas; shipped 5db2876/v2.16.67, audit PASS; live dedup-path confirmation deferred to organic spot-check (db-mars) + createPractica-500 runtime watch (audit).