#614 ·
llmmsg-srv · child of #531Idle over-budget agents never self-compact: wire host-side actuator + fix 3 monitor bugs
- Ref
#614(#614)- Project
llmmsg-srv- Parent
- backlog #531 Transport reliability + observability program (whey/venus/lezama)
- Status
- backlog
- Priority
- high
- Type
- task
- Assigned
- bin-whey-cc
- Created by
- —
- Created
- 2026-06-02T02:00:16.449Z
- Updated
- 2026-06-02T03:25:27.481Z
Questions
No questions.
Event log
-
2026-06-02T02:00:16.450Z · createdwi cli; parent=#531
-
2026-06-02T02:00:41.144Z · ARO brainstorm (pm + support + nw-whey + nw-venus) converged 2026-06-01. ROOT CAUSE: idle over-budget agents have no working self-compact trigger - 3 bugs in cc-context-monitor.sh: (1) per-agent DM line ~544 still prescribes ScheduleWakeup('/compact') = the retired interactive no-op; (2) DM dedup 1/agent/day unless ctx>=2x cap -> the 5 parked whey agents silently get no re-alert; (3) idle interactive sessions never hit UserPromptSubmit so the v1.8 bootstrap nag + over_budget_agents.json injection never fire, and the DM just sits unread in pendingForModel (no turn). DECISION: (A) repoint line-544 DM ScheduleWakeup -> CronCreate one-shot (same fix as v1.8 bootstrap nag) - immediate, helps any agent that DOES drain the buffer. (B) wire the COMPACT half = centralized host-side ACTUATOR: monitor already reads every transcript for $0 every 15min; on OVER+idle, fire 'tmux send-keys "/compact" Enter' into that session's pane. REJECTED nw-venus's per-session recurring gated cron as the fleet primary: every cron wake re-reads full context UNCACHED (~12M tok/day on a 257k Opus session, ~100x the tokens it saves, scales with the bloat it targets) - violates fleet-context-management rule; venus 'works' only because venus sessions are small/few. ACCEPTANCE GATE: one-session probe of 'tmux send-keys /compact' into a known-idle non-/loop pane -> does a real turn fire? send-keys is the human-keystroke path (near-certain) - probe it FIRST; if it fires, ship fleet-wide, RemoteTrigger never needed. FALLBACK (only if NO injection path reaches a dormant REPL, and only for sessions not in a targetable tmux/screen pane): nw-venus's gated one-shot cron, but MUST be armed unconditionally in bootstrap/SessionStart hook (durable=true is a fleet no-op; agent self-arm proven to fail = these 5). OPEN EMPIRICAL Q for bin-whey FIRST: do the parked whey CC sessions run inside a tmux/screen pane the daemon can target? That answer decides whether the cron fallback is ever needed. Owner bin-whey-cc (sh.git/cc-context-monitor.sh lane). Audit/analysis: nw-whey-cc (offered to run the send-keys probe), nw-venus-cc (Design-A reference-impl owner). Line-544 repoint + tmux-reachability check are zero-risk, safe to start; the bootstrap cron-arming fallback is a fleet-wide behavior change -> gated on Elazar GO (y/n pending).
-
2026-06-02T02:01:18.972Z · PREREQUISITE VERIFIED GREEN, both hosts (pm firsthand on whey, nw-whey + nw-venus concurring). Every CC agent runs in a tmux session NAMED by the agent (ccs.sh convention), cmd=claude: agent name === tmux target, NO transcript->pane mapping needed. whey 'tmux list-panes -a' shows all 10 incl all 5 parked (evolutiva-pm-cc-w, proxy-mars-cc-w, hub-llmmsgsrv-cc, pm-llmmsgsrv-cc, bin-whey-cc) addressable now; venus same. => the cron FALLBACK is DROPPED, not needed - host-side actuator covers 100%. FINAL actuator (verbatim): tmux send-keys -t <agent> -l '/compact'; tmux send-keys -t <agent> Enter. For an OVER+idle agent the pane sits at the REPL prompt = literal input-dispatch, identical to a human typing /compact (the path we all agree works). SCOPE for bin-whey-cc, one WI: (1) repoint monitor DM line ~544 ScheduleWakeup->CronCreate; (2) wire COMPACT half: on OVER+idle, send-keys '/compact' into -t <agent>; gate on idle so we never interrupt an active turn. Acceptance = first real fire against the 5 already-parked agents on next 15-min cycle (production is the test; or designate one pane first). nw-whey/nw-venus won't fire /compact into another agent's pane unasked - boundary respected. Cron-arming-in-bootstrap = CANCELLED.
-
2026-06-02T03:25:27.479Z · LIVE TEST RESULTS (pm firsthand on whey, 2026-06-02 00:2x). /context sweep of the 5 monitor-flagged over-budget agents -> only 1 genuinely over: evolutiva-pm-cc-w ~215k NOT compacted (REAL); proxy-mars-cc-w real ~35k ALREADY compacted (monitor claimed 187k); hub-llmmsgsrv-cc real ~53k ALREADY compacted (monitor claimed 186k); bin-whey-cc busy on interactive menu (skipped); pm-llmmsgsrv-cc compacted earlier (monitor claimed 177k). => 3 of 5 were FALSE POSITIVES from monitor staleness. ACTUATOR PROVEN END-TO-END: tmux send-keys -t evolutiva-pm-cc-w -l '/compact' + Enter into the idle pane -> 'Compacting conversation... 7%' live. So the COMPACT-half actuator works verbatim on whey; no further addressability test needed. STALENESS ROOT CAUSE (issue b, the bigger bug): cc-context-monitor computes ctx from the last-assistant-turn tokens in the transcript .jsonl; a compacted-but-IDLE agent takes no new turn, so the last turn is still the pre-compact large one -> the agent stays falsely listed over-budget every cycle until it happens to take one more turn. nw-venus corroborates: over_budget_agents.json has a cycle_ts/1200s freshness gate but that only ages the CYCLE, not the SIZE, so a compacted-idle agent never falls off. FIX (add to #614 scope, bin-whey lane): monitor must detect the compact boundary in the .jsonl (or otherwise read true post-compact size) before flagging - this alone eliminates most false 'parked' alerts. RECONNECT (issue a) = NON-BUG on whey: all 5 had fresh hub last_seen (~40s); hub-llmmsgsrv-cc compacted + online with zero rrll; whey SessionStart hook confirmed v1.8/no-matcher (auto /register + /aro/join). The 'need rrll' is the pendingForModel buffer draining on next tool call, not a disconnect. #614 now 3 parts: (1) monitor staleness fix [NEW, biggest win], (2) host-side /compact actuator [proven], (3) monitor-DM line-544 ScheduleWakeup->CronCreate repoint. Still HELD for Elazar greenlight; he asked for tests-first and this is the test.