Adds optional delegation of agent-queue tasks to the SecuAAS secutools
AI platform (GPU / Gemini / Claude API) instead of dispatching to a
local Claude Code tmux session. Per-task opt-in via YAML frontmatter
fields preferred_ai, allow_delegation, complexity_hint — absence keeps
the Phase 1 behaviour exactly (zero breaking change).
Go side:
- internal/secutools: HTTP client with exponential-backoff retries
(SubmitJob/GetJob/WaitForResult), DecideProvider map adapter for CLI
use, table tests.
- internal/router: struct-typed Decide() with strict precedence
(needs_claude_code > preferred_ai=claude-code > allow_delegation=false
> preferred_ai > fail-safe local on unknown).
- internal/delegation: Manager submits jobs, writes .md.delegated
markers for on-restart recovery, runs a periodic reaper that moves
completed jobs into done/ with provider/cost footer and failed jobs
into failed/.
- internal/dispatcher: WithDelegation() opt-in, routeTask hook before
findFreeSession, skips .md.delegated in assignNextTask.
- internal/api: /api/delegated/status (active jobs + counters),
/watchdog/status extended with delegation counters.
- cmd/ccl-delegate: small CLI exposing submit/get/result/decide so the
bash dispatcher can call the same contract without duplicating logic.
- cmd/claude-failover: delegation wired opt-in via SECUTOOLS_API_KEY.
Tests:
- 29+ new unit tests across router, secutools, delegation, dispatcher,
api packages. go test -race -count=1 clean.
- tests/phase2-E-integration.sh: bash end-to-end against a Python
stdlib mock HTTP server, exercising the dev-management scripts.
Forward-compat with watchdog (Phase 1 B1 already ignores
state=delegated_to_secutools) so delegated tasks aren't flagged stale.
Wire symlinks.ValidateAll into the lifecycle manager so the daemon
refuses to start if any configured account is missing one of the
shared-state symlinks or if a link diverges from the canonical target.
Previously, a missing link on a freshly deployed VM would silently
create a divergent state tree per account (duplicate JSONL transcripts,
broken undo history) — exactly the failure mode the symlinks package
(A1) was introduced to prevent.
The check runs once at startup before EnsureAllSessions, guarding a
single well-defined invariant: "every account home shares the same
projects/, file-history/ and session-env/ roots". No auto-heal on
divergence — we fail fast with an explicit error so the operator fixes
it manually rather than one account's state being overwritten.
Part of Phase 1 Chantier A — Failover robuste.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add internal/lifecycle/manager.go with Manager struct, Run() ticker loop
(15s interval), EnsureAllSessions() for boot-time session creation, and
reconcile() that recreates idle sessions and recovers working ones via
SetFailed + CreateSession
- Add state.SetFailed() to record crash timestamp on SessionState
- Add internal/lifecycle/manager_test.go with mock tmux client and 3 tests:
TestReconcileCreatesDeadSession, TestReconcileRecoversCrashedSession,
TestEnsureAllSessions — all pass
- Wire lifecycle.Manager into cmd/claude-failover/main.go after state init
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>