claude-failover/README.md

2.8 KiB

claude-failover

Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover.

Overview

claude-failover orchestrates a pool of Claude Code sessions running under multiple Anthropic accounts. When the active account reaches its quota threshold (5-hour usage window or weekly cap), the daemon transparently fails over the workload to a backup account without losing in-flight session state.

It is the runtime glue behind the SecuAAS agent pool (ccl-0..ccl-9, ccl-auto-11..ccl-auto-20) and is engineered to hold sessions warm across account swaps by sharing the ~/.claude/projects/ transcript tree via symlinks.

Architecture (goroutines)

The daemon is a single Go binary composed of cooperating goroutines:

  • dispatcher — reads .agent-queue/inbox/*.md across registered projects and assigns tasks to idle sessions.
  • quota-monitor — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold.
  • session-watcher — tracks tmux session liveness (ccl-*), heartbeats, and .agent-queue/status.json transitions (idle / working).
  • checkpoint — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account.
  • janitor — cleans stale .dispatched markers, archives old done/ tasks, prunes expired checkpoints.
  • notifier — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator.
  • account-switcher — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time.

All goroutines communicate through typed channels plus a shared state struct behind a sync.RWMutex. The daemon exposes an HTTP control plane for the MCP server to query status and force-trigger operations.

Relationship to SecuAAS agent-orchestrator

This project extracts the session-management and failover logic that currently lives in dev-management/agent-orchestrator/ (shell scripts: launch-agent.sh, graceful-switch.sh, watchdog.sh, checkpoint-daemon.sh, start-dedicated-agents.sh) and reimplements it as a single Go service. See the orchestrator docs for the operational context this daemon is designed to replace.

Repository layout

cmd/claude-failover/     Main entrypoint
docs/                    Architecture, configuration, analysis notes
scripts/                 Setup helpers (shared-projects symlink, etc.)
config.example.yaml      Annotated example config

Status

Pre-alpha. Design and scaffolding only — no working binary yet.

License

MIT — see LICENSE.