PLUTO-62 ·
plutofn_detectRateSpike index-starved: add archivedAt indexes to 6 archive tables (32% of DB time)
- Ref
PLUTO-62(#936)- Project
pluto- Status
- done
- Priority
- normal
- Type
- task
- Assigned
- — --agent
- Created by
- wi-cli-venus
- Created
- 2026-06-12T06:47:15.044Z
- Updated
- 2026-06-12T07:56:22.301Z
- Closed
- 2026-06-12T07:56:22.301Z
Questions
No questions.
Event log
-
Source: pg_stat_statements dump Elazar sent 2026-06-12. fn_detectRateSpike() = 32.4% of ALL DB time (20,181 calls, 781s total, mean 38.7ms, max 1.6s) — the single largest DB-time consumer. NOT over-firing: it's pg_cron jobid=1 'pluto-rate-spike-detector', schedule */2 * * * * (every 2 min, exactly as designed). Zero triggers, zero app callers (coder repo-wide grep empty; db pg_trigger empty). Confirmed by both db-pluto-cc and coder-pluto-cc. ROOT CAUSE = index starvation. Every 2 min the fn runs a 7-table UNION ALL across archive tables filtering archivedAt >= now()-5min, then a NOT EXISTS dedup vs securityAlerts. 6 of 7 archive tables seq-scan because they lack a usable standalone archivedAt index: - accessRequestsArchive: HAS standalone archivedAt index (OK). - comisionAdjuntosArchive / comisionJtpsArchive / jtpAyudantesArchive (fraud tables): only composite (archivedBy, archivedAt DESC) — NOT usable for an archivedAt-only scan. - practicasArchive / academicHierarchyArchive / practicaReviewsArchive: NO archivedAt index at all. Cost grows with table size — mean_time will keep rising. FIX VECTOR (db-pluto owns; DDL via migration): - Add a standalone archivedAt index to the 6 missing tables. CREATE INDEX CONCURRENTLY (non-blocking on prod). Adding an index does not mutate rows -> no archive-context sentinel needed, but normal migration discipline + schema.md re-export applies. - EVALUATE BRIN vs btree: these archive tables are append-only with near-monotonic archivedAt, and the filter is a recent-range (>= now()-5min). BRIN on archivedAt is tiny and ideal for that access pattern — db to choose btree vs BRIN per row counts. - Alternative db raised: restructure the UNION ALL to exploit the existing (archivedBy, archivedAt) composites via a GROUP BY archivedBy pass — heavier change, only if indexes alone don't suffice. Function is VOLATILE + correct; this is purely a performance fix. Not user-facing/urgent, but it's the dominant DB-time line and worsening. Awaiting Elazar greenlight to apply (prod DDL on fraud-sensitive archive tables; CONCURRENTLY = safe/non-destructive).
-
Archive-table archivedAt indexes shipped (migration 017, 6 btree indexes on practicas/comisionAdjuntos/comisionJtps/jtpAyudantes/academicHierarchy/practicaReviews Archive; accessRequestsArchive already covered). Growth-proofs fn_detectRateSpike's 7-way UNION scans as the never-pruned archive tables fill (log(n) vs linear). Dormant/partial at today's 0-59 rows - today's 91pct DB-time is frequency-driven (every-2min cron x 38.7ms fixed overhead, ~7.6s/day absolute), addressed separately by PLUTO-67 cadence. Applied live + repo-committed 4c1c987, v1.67.31. Audit PASS:017 pre-apply + PASS:4c1c987 PTD (deploy READY, deploymentId match, zero runtime errors, commit boundary clean). CONCURRENTLY/no-txn intentional (index DDL touches no DATA, no archive context needed).