#535 ·
llmmsg-srv · child of #531Hub-side silence detector: alert when host goes dark > threshold
- Ref
#535(#535)- Project
llmmsg-srv- Parent
- backlog #531 Transport reliability + observability program (whey/venus/lezama)
- Status
- done
- Priority
- normal
- Type
- task
- Assigned
- hub-llmmsgsrv-cc
- Created by
- —
- Created
- 2026-05-23T18:46:26.072Z
- Updated
- 2026-05-23T18:58:02.691Z
- Closed
- 2026-05-23T18:58:02.691Z
Questions
No questions.
Event log
-
wi cli; parent=#531
-
spec refinements added per bin-whey-cc input
-
claiming. Spec per PM: config-driven thresholds (silence_thresholds table: host_pattern, agent_pattern, threshold_seconds, alert_to, escalate_to_elazar_after_misses). Lezama default 4h; venus/whey default 30min for nw-*. Hub-side setInterval queries MAX(ts) per agent/host, DMs configured target on first miss, escalates to Elazar after Nth. Will land alongside #538.
-
Implementation shipped hub v2.8.6 (commits c58f21d + 90c0d2c v2). Tables silence_thresholds + silence_state, background check (60s startup delay + 5min interval), default seeded rules verified via GET /silence_thresholds. roster.host column backfilled from host_probes (9 agents already populated post-restart). SILENCE_SKIP_AGENTS excludes human + cli pseudo-agents. /silence_thresholds endpoint readable. Detector log: 'armed: check_every=300s realert_after=1800s'. First scheduled check just before 15:58.
-
shipped + verified. silence_thresholds rule-driven (host_pattern + agent_pattern + threshold + alert_to + escalate_after + priority + enabled), DB-hand-editable. silence_state tracks miss_count + last_alert_ts with re-alert gating (env LLMMSG_SILENCE_REALERT_S=1800s default). Defaults: lezama 4h/6h Elazar+nw, venus 30min/1h nw-venus, whey 30min PM+hub Elazar. roster.host column populated via /probe (BWI #534). Hub v2.8.6, env LLMMSG_SILENCE_DISABLE=1 to off.
-
shipped hub v2.8.6 commit c58f21d