# claude-failover Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover. ## Overview `claude-failover` orchestrates a pool of Claude Code sessions running under multiple Anthropic accounts. When the active account reaches its quota threshold (5-hour usage window or weekly cap), the daemon transparently fails over the workload to a backup account without losing in-flight session state. It is the runtime glue behind the SecuAAS agent pool (`ccl-0`..`ccl-9`, `ccl-auto-11`..`ccl-auto-20`) and is engineered to hold sessions warm across account swaps by sharing the `~/.claude/projects/` transcript tree via symlinks. ## Architecture (goroutines) The daemon is a single Go binary composed of cooperating goroutines: - **dispatcher** — reads `.agent-queue/inbox/*.md` across registered projects and assigns tasks to idle sessions. - **quota-monitor** — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold. - **session-watcher** — tracks tmux session liveness (`ccl-*`), heartbeats, and `.agent-queue/status.json` transitions (idle / working). - **checkpoint** — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account. - **janitor** — cleans stale `.dispatched` markers, archives old `done/` tasks, prunes expired checkpoints. - **notifier** — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator. - **account-switcher** — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time. All goroutines communicate through typed channels plus a shared state struct behind a `sync.RWMutex`. The daemon exposes an HTTP control plane for the MCP server to query status and force-trigger operations. ## Relationship to SecuAAS agent-orchestrator This project extracts the session-management and failover logic that currently lives in `dev-management/agent-orchestrator/` (shell scripts: `launch-agent.sh`, `graceful-switch.sh`, `watchdog.sh`, `checkpoint-daemon.sh`, `start-dedicated-agents.sh`) and reimplements it as a single Go service. See the orchestrator docs for the operational context this daemon is designed to replace. ## Repository layout ``` cmd/claude-failover/ Main entrypoint docs/ Architecture, configuration, analysis notes scripts/ Setup helpers (shared-projects symlink, etc.) config.example.yaml Annotated example config ``` ## Status Pre-alpha. Design and scaffolding only — no working binary yet. ## License MIT — see `LICENSE`.