Adds internal/safety/ — the in-repo source of truth for the PreToolUse hook
deployed into every project before a Claude Code agent is launched. The hook
blocks destructive Bash/Edit/Write patterns on sessions running with
--dangerously-skip-permissions, closing the exploitation path where a prompt
injection via MCP sessions.send could otherwise trigger arbitrary destruction
without interactive confirmation.
Wire-up:
- internal/dispatcher/dispatcher.go launchAgent: deploys hook before claude
launch; fail-closed if deployment fails.
- internal/switcher/account_switcher.go relaunchDedicatedSessions: redeploys
hook before --resume after account failover; fail-open (log + continue)
since the initial deployment is still in place.
Blocks (exit 2, stderr shown to model):
- rm -rf targeting /, ~, $HOME, /etc, /var, /usr, /boot
- dd of=/dev/{sd,nvme,disk,hd,mmcblk}*, mkfs*
- git push --force (but allows --force-with-lease)
- git reset --hard on main|master|production
- sudo outside short allowlist (systemctl, journalctl, cp, install, apt*)
- curl|sh, bash <(curl ...), eval "$(curl ...)", fork bomb, crontab -e
- chmod 777 on system paths / home
- Writes to .claude/settings*.json, .claude/hooks/, ~/.ssh/authorized_keys,
shell rc files, /etc/sudoers*, /etc/systemd/*
Warn-only (logged, not blocked):
- kubectl delete, helm uninstall, terraform destroy
- DROP TABLE, TRUNCATE TABLE, DELETE FROM ... WHERE 1=1
Hook script is embedded via //go:embed so a single binary release carries
the authoritative copy. Every launch rewrites the deployed file with mode
0555 (anti-tamper); the hook itself also blocks writes to .claude/hooks/
for defense in depth.
Decision: Olivier, 2026-04-19 — Option A now, Option C (two pools) tracked
separately. Complements FNDG-04 input sanitization in secuaas-mcp.
Tests: 8 unit/integration tests in internal/safety/, plus a dispatcher-level
test verifying the hook is written before launch. go vet clean, go test ./...
all pass.
Refs: FNDG-04 audit (secuaas-mcp branch audit/mcp-stdio-2026-04-18)
Task: .agent-queue/inbox/20260418-211102-fndg-04b-*.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
No functional change. Groups yaml.v3 and fsnotify as direct deps,
isolates golang.org/x/sys as indirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug #1 (CRITIQUE) — A3 flip+ensure inconsistency
- Before: EnsureForAccount failure after flip was WARN-only, SetActiveAccount
still fired → daemon declared target active while shared symlinks were
absent/divergent → transcripts silently duplicated, resume broken.
- After: ensure failure triggers rollback flip to previous account home;
if rollback succeeds → explicit error, ActiveAccount stays on previous.
If rollback ALSO fails → sticky partialSwap flag + ErrPartialSwap; all
further swaps refused until operator intervention (daemon restart).
- New public IsPartialSwap() for watchdog / health-check integration.
Bug #10 (MOYENNE) — requiredShared contract never exercised
- All existing tests override a.sharedSymlinks with tmpdir-scoped lists,
so symlinks.RequiredShared itself was never tested. A rename or drop
would pass every test but silently break prod failover.
- TestRequiredSharedIsCoherent asserts (no filesystem): 3 entries with
the exact required names, absolute targets, and a single shared parent
directory (invariant EnsureForAccount depends on).
Tests:
- go test ./... PASS
- go test -race ./... PASS (no data race)
- 2 new switcher tests: TestFlipEnsureFailureTriggersRollback,
TestFlipEnsureAndRollbackFailure
- 1 new symlinks test: TestRequiredSharedIsCoherent
- 1 obsolete test replaced: TestFlipEnsureSymlinksFailureDoesNotAbortSwap
(encoded the old buggy best-effort behaviour)
Wire symlinks.EnsureForAccount into executeSwitch, called immediately
after the ~/.claude flip. Guarantees the three shared-state links
(session-env, file-history, projects) exist on the target account home
even for freshly-provisioned accounts, preventing silent transcript
duplication and undo-history divergence on first resume.
Best-effort: errors are logged as WARN but never abort the swap. If we
returned here the daemon would be left inconsistent (symlink flipped,
SetActiveAccount never called). Operator sees the warning in logs and
resolves divergent links manually.
Tests:
- TestFlipReconcilesSharedSymlinksOnTargetHome: empty target home gets
all three links pointing at canonical targets after the flip.
- TestFlipEnsureSymlinksFailureDoesNotAbortSwap: a planted divergent
link triggers the symlinks-package error; the swap completes anyway
and the active account is updated.
Hermetic: added AccountSwitcher.sharedSymlinks override so tests scope
the reconcile inside t.TempDir() and never touch
/home/ubuntu/.claude-*-shared. Existing tests migrated to this pattern
and hardcoded /tmp/claude-*-xxxx paths replaced with tmpdirs.
Phase 1 / Chantier A — task A3.
Wire symlinks.ValidateAll into the lifecycle manager so the daemon
refuses to start if any configured account is missing one of the
shared-state symlinks or if a link diverges from the canonical target.
Previously, a missing link on a freshly deployed VM would silently
create a divergent state tree per account (duplicate JSONL transcripts,
broken undo history) — exactly the failure mode the symlinks package
(A1) was introduced to prevent.
The check runs once at startup before EnsureAllSessions, guarding a
single well-defined invariant: "every account home shares the same
projects/, file-history/ and session-env/ roots". No auto-heal on
divergence — we fail fast with an explicit error so the operator fixes
it manually rather than one account's state being overwritten.
Part of Phase 1 Chantier A — Failover robuste.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/symlinks package that encodes in code the convention
previously maintained by hand on the VM: every Claude account home
must expose `session-env`, `file-history` and `projects` as symlinks
to a single shared target, so account failover does not create
divergent state (duplicate JSONL transcripts, broken undo history).
- EnsureForAccount(home, required) creates missing links and target
directories, refuses to auto-correct a divergent link (risks data
loss), and errors when a regular file sits where the link belongs.
- ValidateAll(homes, required) aggregates errors across both accounts
so the operator sees every problem at once rather than fixing one
per restart cycle.
- RequiredShared exposes the production defaults so lifecycle and
switcher (A2/A3) can depend on it directly.
9/9 unit tests green.
Part of Phase 1 Chantier A — Failover robuste.
Observed: tasks from filesecure/.agent-queue/inbox and SecuScan/
.agent-queue/inbox were being routed into ccl-1-conformvault and
ccl-2-scanyze whenever those sessions happened to be idle. Those are
the operator's manual interactive Claude sessions, not dispatch
targets — the auto-dispatch was (a) hijacking a Claude instance the
operator was using and (b) triggering /exit via the watcher's
completion path when the side-task finished, kicking the operator out
mid-conversation.
findFreeSession was iterating Pool.Dedicated before the autonomous
pool, so any idle dedicated session was the first candidate.
- Dispatcher.findFreeSession: remove the Dedicated loop entirely.
Auto-dispatch is now pool-only (ccl-auto-11..20).
- Watcher.completeSession: defense-in-depth — even if a dedicated
session ever ends up in "working" state, it is no longer /exit'd;
just marked idle. Pool /exit behaviour unchanged (context recycle).
- Tests: new TestFindFreeSessionSkipsDedicated proves the routing;
3 existing tests rewritten to use the autonomous pool instead of
relying on Dedicated as a fake pool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multi-line task bodies arrived in Claude Code as "[Pasted text #N +M lines]"
and sat in the input buffer forever — the trailing Enter that SendKeys
appends to the paste is consumed as a newline inside the paste, not as a
submit. Observed live on ccl-auto-11 (secumon) and ccl-auto-12 (secuops):
prompt visible, agent idle.
- tmux.Client grows a SendEnter(session) method. ExecClient runs
`tmux send-keys -t <sess> Enter` (no preceding text), which Claude's
TUI accepts as the explicit submit action after a paste.
- Dispatcher: after SendKeys(msg), sleep 500ms for the paste to register,
then SendEnter. Same sequence a human would perform.
- Five mockTmux implementations updated (quota, dispatcher, switcher,
lifecycle, watcher tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production had two disjoint tmux pools named alike but for different
purposes:
ccl-0..ccl-9 — manual/interactive sessions (operator)
ccl-auto-11..ccl-auto-20 — autonomous dispatcher pool
Until now the daemon's loops iterated prefix + 0..Max, so with the
deployed config ("prefix: ccl-auto", min=2, max=10) the dispatcher
looked for sessions "ccl-auto0..ccl-auto9" that never existed, while
the real auto pool ccl-auto-11..20 was invisible. Net effect: no task
was ever dispatched, and killAllPoolSessions fabricated phantom
"ccl-auto0/1" sessions on each swap.
- AutonomousConfig gains StartIndex (yaml start_index, default 0).
Behaviour is unchanged when StartIndex is 0.
- Monitor, switcher (kill + recreate), dispatcher (findFreeSession),
and lifecycle (EnsureAll + reconcile) all iterate
[StartIndex, StartIndex+Max) so the daemon only touches its own
range and leaves ccl-0..ccl-9 alone.
- Production config updated to prefix: "ccl-auto-", start_index: 11,
min: 10, max: 10 — covering the 10 real ccl-auto-11..20 sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a legitimate quota hit triggered a swap, killAllPoolSessions tore
down the dedicated interactive sessions (ccl-1-conformvault, ccl-2-scanyze)
along with the pool, then recreatePoolSessions re-opened them at a bare
bash prompt. The operator had to manually re-run
CLAUDE_CONFIG_DIR=<target> claude --dangerously-skip-permissions --resume <uuid>
after every swap, losing whatever conversation was mid-flight.
saveAllSessions only iterates sessions tracked as "working" in state;
user-driven dedicated sessions are rarely in that state so their resume
UUIDs were never saved.
- saveDedicatedUUIDs: capture resume UUID for every configured dedicated
session regardless of tracked state, before kill.
- relaunchDedicatedSessions(targetHome): after recreate, send a resume
command on each dedicated session pointing CLAUDE_CONFIG_DIR at the
target account's home. Missing UUID → leave at shell, no blind launch.
- isValidResumeUUID hardens against a corrupted resume-id.txt.
New TestDedicatedRelaunchAfterSwap verifies end-to-end: pane capture →
UUID persisted → resume command sent with the correct CLAUDE_CONFIG_DIR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.2.2's 2-poll confirmation was insufficient because Anthropic 500/503
errors are printed into Claude Code's conversation transcript and stay
visible in every tmux capture until the user scrolls. A persistent
server error would confirm on the second poll and still trigger a swap.
Root cause: the pattern "rate limit" (bare substring) matched any 500
payload that happened to mention rate limits in its error text. Real
HTTP 429s from Anthropic are typed as "rate_limit_error" in the error
payload — and that's the signature we should actually key on.
- Remove "rate limit" from quotaPatterns (too generic — matches transcripts).
- Add "rate_limit_error" (Anthropic's typed 429 error) and "5-hour limit".
- Add serverErrorPatterns veto: "api_error", "overloaded_error",
"internal server error", "api error: 5". When any is present in the
pane, isQuotaExhausted returns false even if a quota pattern matched.
- 4 new subtests covering the veto paths + sanity that real 429s pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anthropic HTTP 500 errors surface in the TUI with payloads containing
"rate limit" text, which the monitor was matching against quotaPatterns
and treating as a real 429 quota hit. With no cooldown and no
confirmation, a burst of 500s produced sub-minute ping-pong swaps that
tore down user sessions.
Two-layer fix:
- quota.reactivate_cooldown (already in config, 5m) now gates the
monitor too — not just the dispatcher. A completed swap suppresses
further detection for the cooldown window.
- A hit with no parseable reset time is treated as suspected only on
the first poll; a second consecutive poll is required before
emitting SwapRequested. Legitimate 429s with "resets in ..." still
swap instantly on the first detection.
Adds state.RecordSwap / LastSwapInfo for the cooldown, and a
forensic log line on every detection: trigger_session, matched
pattern, 120-char pane snippet.
Tests cover: instant swap with reset, 2-poll confirmation without
reset, and suspected-state reset on recovery.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>