| OPS-12 |
agent-ops |
Recover lost 2026-05-26 evolutiva-commons patch: §Shared MD Flow + §Versioning + §Boundaries B1/B4/B5 — check llmmsg message DB for original content |
backlog |
high |
bin-whey-cc |
2026-06-09 04:04 |
| OPS-11 |
agent-ops |
cc-context-monitor last_turn() v2.14 residual stale-read: agent force-compacted 'for no reason' every other session. Verified 2026-06-09 nw-whey-cc reported 184k while live /context=63.6k (3x phantom). last_turn (~line 312) reads transcript's last usage block = summarizer's pre-compact cache_read until next real turn; compact_boundary reverse-walk skip has a hole (boundary outside TAIL_BYTES, or pre-compact reading sampled in the gap before boundary written). Fix: gate actuator on LIVE window occupancy, not stale transcript estimate. NOT a cap problem - do NOT bump nw ROLE_CAP (Elazar explicit). Recurring user-facing churn. |
inProgress |
high |
bin-whey-cc |
2026-06-09 03:22 |
| OPS-10 |
agent-ops |
attach-tmux.sh nests tmux/stacks extra login shell on switch-client path (recurring, 'fixed 5x') |
backlog |
high |
bin-whey-cc |
2026-06-07 08:31 |
| OPS-9 |
agent-ops |
cc-context-monitor path-B/wind-down DM mis-targeting: db-mars-cc remediation delivered to nw-venus-cc |
backlog |
high |
nw-whey-cc |
2026-06-06 07:17 |
| #608 |
agent-ops |
lezama host-dark recurrence: root-cause + harden against full-dark (2nd occurrence; died 2026-06-01 18:03 ART, clean cliff) |
backlog |
high |
nw-lezama-cc |
2026-06-02 00:45 |
| #610 |
agent-ops |
└─ Validate the installed recovery software (failed twice): identify unit, check prior-boot status+journal (fired/tried/failed?), confirm dead-kernel/power is below userland scope; validate reverse-tunnel health EXTERNALLY from whey, never trust the ssh_r_active self-report flag |
backlog |
high |
nw-lezama-cc |
2026-06-02 00:45 |
| #609 |
agent-ops |
└─ Reboot forensics: journalctl -b -1 -k (panic trace=crash / empty=power loss) + full -b -1 (OOM?), dmesg|tail-50, df -h, last -x reboot, smartctl -a |
backlog |
high |
nw-lezama-cc |
2026-06-02 00:44 |
| #613 |
agent-ops |
└─ Root-cause-gated remediation (after forensics): OOM->add swap + tune vm.overcommit_memory; disk-full->monitor cron + notify-elazar; kernel panic->enable kdump for post-mortem |
blocked |
normal |
nw-lezama-cc |
2026-06-02 00:44 |
| #600 |
agent-ops |
Global PreToolUse enforcement hook: deny non-llmmsg-srv-team agents Edit/Write to /opt/llmmsg-srv/{hub,bridge,scripts/lib}/** + deny Bash curl/http to hub privileged endpoints (extend existing PATH-script-block hook). Elazar approved ship - recurring: agents edit hub code/curl API instead of MCP tools. |
backlog |
high |
nw-whey-cc |
2026-06-01 14:43 |
| #579 |
agent-ops |
dbcaba backup gaps: dev backup targets wrong db + gdrive backup-mirror broke 2026-05-16 (dbcaba has no auto-backup today) |
backlog |
high |
maintainer-movilba-cc-l |
2026-05-29 08:53 |
| #552 |
agent-ops |
Agent name parsing: forbid dashes inside role/project (single-token parts only) |
backlog |
high |
pm-llmmsgsrv-cc |
2026-05-29 07:30 |
| #458 |
agent-ops |
memory-budget over-budget warning: route shared-file footprint breaches to project PM with escalation chain |
backlog |
high |
bin-whey-cc |
2026-05-22 05:50 |
| #454 |
agent-ops |
llmmsg-srv roster & PM resilience: pruneStale soft-offline, offline buffering, PM re-election |
inProgress |
high |
llmmsg-srv-cc |
2026-05-22 03:52 |
| #455 |
agent-ops |
└─ PM-liveness watchdog: cc-context-monitor detects a PM-less ARO, DMs Elazar |
blocked |
high |
bin-whey-cc |
2026-05-21 16:27 |
| #459 |
agent-ops |
Plugin hygiene v2 - rollout completion (lezama + deferred items) |
inProgress |
high |
llmmsg-srv-cc |
2026-05-22 03:15 |
| #460 |
agent-ops |
└─ Plugin hygiene v2 - lezama host portion: telegram uninstall + host-global lean flip + movilba per-agent enables (see parent 459) |
backlog |
high |
nw-lezama-cc |
2026-05-22 03:15 |
| #451 |
agent-ops |
cc-context-monitor over-cap caps are stale — false alarms fleet-wide. THRESH_OPUS=160k/HAIKU=100k/SONNET=135k are ~80% of the OLD ~200k-class windows. Fleet now runs claude-opus-4-7[1m] = 1M-token window; a 268k session is 27% full, not over cap. Transcript model string lacks the [1m] marker so monitor can't auto-detect window size. Fix: bin-whey-cc updates THRESH_* in cc-context-monitor.sh to real values per Elazar's decision (true-window fraction vs. relabelled cost-budget). All of 2026-05-21's over-cap wind-downs were triggered by this false cap. |
blocked |
high |
bin-whey-cc |
2026-05-21 12:56 |
| #445 |
agent-ops |
eq backend unbuilt: hub.mjs has no /eq_* routes + no elazar_questions table — eq CLI POSTs to non-existent endpoints (/eq_ls returns not_found). chat-duo Pending panel reads elazar_questions, so no Elazar question can ever surface. Implement hub-side /eq_add /eq_ls /eq_show /eq_answer /eq_dismiss /eq_stale + elazar_questions schema per eq-spec.md |
backlog |
high |
llmmsg-srv-cc |
2026-05-21 06:16 |
| #265 |
agent-ops |
backup-whey.service: NOPASSWD sudo for cryptsetup+mount in /etc/sudoers.d/backup-whey (failing since run #23 2026-05-08T14:22 - sudo prompts for rob pw under systemd no-tty); owner: os-whey-cc-w |
backlog |
high |
whey-nw-cc |
2026-05-10 03:16 |
| OPS-17 |
agent-ops |
cc-context-monitor zombie-detector false-positives on reactive-idle agents: rule (turns>=2 && tool_calls==0 -> zombie) can't distinguish a self-armed-wakeup zombie from a reactive agent woken by inbound llmmsg DM that acked + stood down with no tools needed. At 05:00 it flagged audit-venus-cc (benign: empty CronList, zero ScheduleWakeup ever, 100% inbound-DM-driven) -> 2 dup alerts + cross-host investigation. Hits EVERY reactive agent (host nw between alerts, audit agents). Fix (nw-whey suggested): a wake is zombie-suspect only if NOT preceded by an inbound message in the window (self-triggered) AND/OR non-empty CronList; if wake correlates with inbound DM + empty CronList -> reactive-idle, suppress. Hub has inbound-msg timestamps to correlate. Related: OPS-11 last_turn stale-read. |
backlog |
normal |
bin-whey-cc |
2026-06-14 08:03 |
| OPS-16 |
agent-ops |
Scrub stale eq / Elazar Pending text from bwi-tool PROJECT canonical docs (MSG-36 followup): /home/rob/Documents/work/pensanta/projects/basquetWi/basquetWiInstructions.md + that project's CLAUDE.md still carry old eq/Pending wording. Not the global-rule SSOT (gdrive is, already fixed) but scrub for consistency. Owner: bwi-tool maintainer. |
backlog |
normal |
pm-basquetwi-cc |
2026-06-14 06:39 |
| OPS-14 |
agent-ops |
whey↔lezama global ~/.claude/CLAUDE.md §-level reconcile: lezama missing entire sections whey has (was missing ## Collaborate — added 2026-06-14; also lacks Permission, Communication, others). Audit all 3 hosts' global CLAUDE.md section sets, decide canonical fleet-neutral set, mirror missing sections per-host. Surfaced during the log-review-autonomy global-line rollout. |
backlog |
normal |
nw-whey-cc |
2026-06-14 05:21 |
| OPS-8 |
agent-ops |
Fleet model audit: many agents still pinned Opus 4.7; move to default-tracks-latest (verify default=Opus 4.8 + future models) or pin Opus 4.8; review model choice per agent by role |
backlog |
normal |
nw-whey-cc |
2026-06-06 04:25 |
| OPS-7 |
agent-ops |
attach-tmux.sh: list OFFLINE agents too; selecting one launches its ccs session and attaches you to it |
backlog |
normal |
nw-whey-cc |
2026-06-06 03:53 |
| OPS-1 |
agent-ops |
db-size-monitor blind to /var/lib: add to scan roots (currently /home/rob /opt /srv /rtshared). The llmmsg-srv hub DB (/var/lib/llmmsg-srv/v2.sqlite 38M live + llmmsg.sqlite 30M dead-v1) never appears in size snapshots - exactly the DB Elazar worries about. Also surfaces probe-table growth (host_probes 32k rows) for retention review. |
backlog |
normal |
nw-whey-cc |
2026-06-05 06:25 |
| #644 |
agent-ops |
Fleet google-workspace real-usage sweep on venus+lezama: grep each host's transcripts for actual mcp__google-workspace__ tool_use (not installs/mentions); add per-project where used. whey done: odaia(heavy), pluto-pm(light), nw-whey(1 gmail). |
backlog |
normal |
nw-whey-cc |
2026-06-05 04:27 |
| #643 |
agent-ops |
Audit odaia agent roster — 15 cwds under ~/Documents/personal/odaia looks too many; consolidate, then properly scope google-workspace MCP per surviving agent (currently re-added to all 15 per-project as a stopgap 2026-06-05). |
backlog |
normal |
nw-whey-cc |
2026-06-05 04:27 |
| #629 |
agent-ops |
Normalize agent names + folder/.aro/.agent-name layout: fix hyphenated project-names in cc-names, KEEP host suffix -w/-v/-l (it identifies cross-host agents, e.g. proxy-<proj>-cc-w); rename coder->coder01/coderhelp->coder02/db->db01/db2->db02 via claude-mv per-project (agent-driven, not manual); build drift-normalizer script. whey+venus now, lezama later. |
backlog |
normal |
nw-whey-cc |
2026-06-04 14:50 |
| #525 |
agent-ops |
Review evolutiva backup policy + space (2h/30d/gdrive, ~1.75GB steady-state; context: WI #509) |
backlog |
normal |
nw-whey-cc |
2026-06-03 04:06 |
| #555 |
agent-ops |
bisync watchdog: alert when gdrive-bisync.service fails > 2 consecutive runs |
backlog |
normal |
bin-whey-cc |
2026-05-29 02:15 |
| #554 |
agent-ops |
Fleet audit: every agent's cwd has local .agent-name matching registered name |
backlog |
normal |
bin-whey-cc |
2026-05-25 02:04 |
| #527 |
agent-ops |
pull cc-memory-sweep.sh v1.2 (sh.git 20ce2fe) onto whey + lezama |
backlog |
normal |
bin-whey-cc |
2026-05-23 06:33 |
| #526 |
agent-ops |
memlint-cron.sh: replace email escalation with llmmsg DM to host maintainer (nw-{host}-cc) |
backlog |
normal |
bin-whey-cc |
2026-05-23 06:33 |
| #480 |
agent-ops |
bwi git-sync dead since 2026-05-06: decide whey-only policy vs resume DB commits+pushes to GitHub; venus agents routing through whey meanwhile |
backlog |
normal |
— |
2026-05-22 05:30 |
| #467 |
agent-ops |
memory-lint compute_footprint over-walks CLAUDE.md chain - ignores claudeMdExcludes |
backlog |
normal |
bin-whey-cc |
2026-05-22 04:15 |
| #464 |
agent-ops |
cc-context efficiency report - read-only reporting layer on cc-context-metrics.sqlite (week/day/48h) |
backlog |
normal |
bin-whey-cc |
2026-05-22 03:40 |
| #443 |
agent-ops |
cc-context daily digest: delivery channel decision (thread 5) — A fold into 08:00 elazar-pending-digest / B standalone unit+hour / skip |
todo |
normal |
nw-whey-cc |
2026-05-21 12:48 |
| #447 |
agent-ops |
STRUCTURAL: /loop self-/compact over-cap workstream. Proven (Elazar 2026-05-21, ~28-30k ctx): a /loop session self-/compacts via ScheduleWakeup prompt /compact. Replaces cap->park->relaunch with cap->ss->self-/compact->continue. Gated on a ccs.sh launcher decision (ScheduleWakeup exists only in /loop-mode sessions) — needs Elazar GO. Separate from v2.7 over-cap path |
todo |
normal |
bin-whey-cc |
2026-05-21 07:12 |
| #402 |
agent-ops |
memory-lint scope bug: gitpush pre-hook runs memory-lint.sh with no arg → scans ALL ~/.claude/projects/*/memory dirs → one project's memory hygiene blocks every other project's pushes. Cross-project DoS. Fix: scope to current-project memory dir only (resolve from PWD or .agent-name). Surfaced 2026-05-17 by coder-mars-cc-29923 blocked on llmmsg-srv-cc's dirty memory dir. |
backlog |
normal |
— |
2026-05-17 06:27 |
| #404 |
agent-ops |
└─ memory-lint A/B/D/E: gitpush-pre.sh per-project scope (arg, not glob) + non-blocking warn-only + drop §11 heading-dup from per-push + never touch CLAUDE.md. Mars+pluto+venus .gitpush-pre.sh mirror. Ratified by Elazar -30088. |
backlog |
high |
whey-nw-cc |
2026-05-17 06:40 |