Adds optional delegation of agent-queue tasks to the SecuAAS secutools AI platform (GPU / Gemini / Claude API) instead of dispatching to a local Claude Code tmux session. Per-task opt-in via YAML frontmatter fields preferred_ai, allow_delegation, complexity_hint — absence keeps the Phase 1 behaviour exactly (zero breaking change). Go side: - internal/secutools: HTTP client with exponential-backoff retries (SubmitJob/GetJob/WaitForResult), DecideProvider map adapter for CLI use, table tests. - internal/router: struct-typed Decide() with strict precedence (needs_claude_code > preferred_ai=claude-code > allow_delegation=false > preferred_ai > fail-safe local on unknown). - internal/delegation: Manager submits jobs, writes .md.delegated markers for on-restart recovery, runs a periodic reaper that moves completed jobs into done/ with provider/cost footer and failed jobs into failed/. - internal/dispatcher: WithDelegation() opt-in, routeTask hook before findFreeSession, skips .md.delegated in assignNextTask. - internal/api: /api/delegated/status (active jobs + counters), /watchdog/status extended with delegation counters. - cmd/ccl-delegate: small CLI exposing submit/get/result/decide so the bash dispatcher can call the same contract without duplicating logic. - cmd/claude-failover: delegation wired opt-in via SECUTOOLS_API_KEY. Tests: - 29+ new unit tests across router, secutools, delegation, dispatcher, api packages. go test -race -count=1 clean. - tests/phase2-E-integration.sh: bash end-to-end against a Python stdlib mock HTTP server, exercising the dev-management scripts. Forward-compat with watchdog (Phase 1 B1 already ignores state=delegated_to_secutools) so delegated tasks aren't flagged stale. |
||
|---|---|---|
| cmd | ||
| docs | ||
| internal | ||
| scripts | ||
| tests | ||
| .gitignore | ||
| CLAUDE.md | ||
| config.example.yaml | ||
| go.mod | ||
| go.sum | ||
| LICENSE | ||
| README.md | ||
| VERSION.md | ||
| WORK_IN_PROGRESS.md | ||
claude-failover
Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover.
Overview
claude-failover orchestrates a pool of Claude Code sessions running under
multiple Anthropic accounts. When the active account reaches its quota
threshold (5-hour usage window or weekly cap), the daemon transparently fails
over the workload to a backup account without losing in-flight session state.
It is the runtime glue behind the SecuAAS agent pool (ccl-0..ccl-9,
ccl-auto-11..ccl-auto-20) and is engineered to hold sessions warm across
account swaps by sharing the ~/.claude/projects/ transcript tree via
symlinks.
Architecture (goroutines)
The daemon is a single Go binary composed of cooperating goroutines:
- dispatcher — reads
.agent-queue/inbox/*.mdacross registered projects and assigns tasks to idle sessions. - quota-monitor — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold.
- session-watcher — tracks tmux session liveness (
ccl-*), heartbeats, and.agent-queue/status.jsontransitions (idle / working). - checkpoint — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account.
- janitor — cleans stale
.dispatchedmarkers, archives olddone/tasks, prunes expired checkpoints. - notifier — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator.
- account-switcher — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time.
All goroutines communicate through typed channels plus a shared state struct
behind a sync.RWMutex. The daemon exposes an HTTP control plane for the
MCP server to query status and force-trigger operations.
Relationship to SecuAAS agent-orchestrator
This project extracts the session-management and failover logic that
currently lives in dev-management/agent-orchestrator/ (shell scripts:
launch-agent.sh, graceful-switch.sh, watchdog.sh,
checkpoint-daemon.sh, start-dedicated-agents.sh) and reimplements it
as a single Go service. See the orchestrator docs for the operational
context this daemon is designed to replace.
Repository layout
cmd/claude-failover/ Main entrypoint
docs/ Architecture, configuration, analysis notes
scripts/ Setup helpers (shared-projects symlink, etc.)
config.example.yaml Annotated example config
Status
Pre-alpha. Design and scaffolding only — no working binary yet.
License
MIT — see LICENSE.