PLUTO-67 fn_detectRateSpike cadence: every-2min cron is 91% of DB-time (frequency-driven, ~7.6s/day absolute) - weigh fraud-detection latency vs cost

PLUTO-67 · pluto

fn_detectRateSpike cadence: every-2min cron is 91% of DB-time (frequency-driven, ~7.6s/day absolute) - weigh fraud-detection latency vs cost

Ref: PLUTO-67 (#943)
Project: pluto
Status: backlog
Priority: low
Type: task
Assigned: — --agent
Created by: wi-cli-venus
Created: 2026-06-12T07:53:51.676Z
Updated: 2026-06-12T07:53:51.676Z

Sub-items

No sub-items.

+ Add sub-item

Questions

No questions.

Event log

2026-06-12T07:53:51.676Z · created · wi-cli-venus
2026-06-12T07:53:52.163Z · note · wi-cli-venus

Audit measured via pg_stat_statements: SELECT fn_detectRateSpike() = 20216 calls, mean 38.7ms fixed per-invocation overhead (7-way UNION ALL planning + CTE aggregate + securityAlerts dedup + plpgsql), 782910ms cumulative since 2026-03-01 reset = 91.57 pct of tracked DB exec time. BUT absolute is ~7.6s/day - same percentage-illusion class as the authenticator/PostgREST churn finding. The function watches the fraud-assignment tables (comisionJtps/Adjuntos/jtpAyudantes) so cadence is a FRAUD-DETECTION-LATENCY tradeoff, not a pure perf knob: every-2min to every-5min cuts invocations 2.5x (91 pct share to ~37 pct) but slows spike detection. With only Elazar active today the latency cost is near-zero; revisit when real users ramp. One-line vercel.json crons edit (coder lane). Elazar decides the cadence number. Do NOT cut silently.
2026-06-12T08:09:47.449Z · note · wi-cli-venus

APPROACH SUPERSEDED by 3-PM consensus 2026-06-12 (mars+venus+pluto). Drop the cadence change (it trades fraud-detection latency). Real fix = STOP POLLING, go EVENT-DRIVEN: fire the spike check from an AFTER INSERT (statement-level) trigger on the fraud-assignment tables (comisionJtps/Adjuntos/jtpAyudantes) instead of the every-2min pg_cron 7-way UNION. Deletes ~20k idle invocations/day AND drops fraud latency to write-time (detect ON the write, not up to 2min later) - illusion-free win on both axes. Then drop/disable pg_cron jobid=1 pluto-rate-spike-detector. GENERALIZABLE side-lesson (Mars): the 38.7ms is mostly PLANNING the UNION+CTE+dedup; any hot repeated query pays planning per call - wrap in plpgsql/prepared stmt for plan-cache. This WI's title still says 'cadence' but the chosen method is now event-driven trigger; cadence is dead. db owns the trigger design (fraud-table archive triggers already exist - fold the check in); audit pre-design ping (touches fraud-detection logic). HOLD until Elazar greenlights the rework.
2026-06-12T08:09:59.536Z · note · wi-cli-venus

Final 3-PM layered plan (refines prior note). NOW (ship-today, free, zero tradeoff): SENTINEL-GATE - gate the 7-way UNION behind a 1-row max(archivedAt) index probe; if nothing changed since last run, short-circuit. Drops idle case ~30x with ZERO latency loss + ZERO fraud risk. Keeps the 2-min cron. ENDGAME: EVENT-DRIVEN - fold the spike check into the existing fraud-table archive trigger, fires at write-time, kills all 720 idle runs/day, fraud latency goes DOWN. SKIP: cadence backoff (degrades detection) - dead. Sequence: sentinel-gate first (small, reversible), event-driven as the clean follow-up. Both db-owned + audit pre-design ping (fraud-detection logic). HOLD for Elazar greenlight.
2026-06-12T08:27:18.875Z · note · wi-cli-venus

L1 SHIPPED + PTD. Sentinel-gate live on fn_detectRateSpike (migration 018, SHA 3558c99, v1.67.32). EXISTS-over-window probe (now()-5min OR'd across 7 archive tables, 017-indexed) short-circuits the expensive 7-way spike analysis ONLY when the window is genuinely empty - audit-verified no burst slips past (probe predicate byte-identical to body candidate filter; detection body character-identical to pre-018 baseline; v_window_start non-NULL so no silently-dead-fn mode). Idle ~38.7ms runs now collapse to a cheap EXISTS, zero detection-latency/fraud tradeoff. Audit PASS:018-live (post-apply) + PASS:3558c99 (PTD). NOTE: db applied 018 before the pre-apply gate (2nd occurrence after PLUTO-34) - near-zero risk here (reversible CREATE OR REPLACE fn, design was PASS:L1) but gate-discipline reinforced. L2 (event-driven: fold the check into the fraud-table archive triggers + disable pg_cron jobid=1) remains DEFERRED - bigger change, do after L1 proven live + on Elazar greenlight. WI stays OPEN for L2.