69 lines
2.8 KiB
Markdown
69 lines
2.8 KiB
Markdown
|
|
# claude-failover
|
||
|
|
|
||
|
|
Go daemon for Claude Code multi-account session orchestration with automatic
|
||
|
|
quota-based failover.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
`claude-failover` orchestrates a pool of Claude Code sessions running under
|
||
|
|
multiple Anthropic accounts. When the active account reaches its quota
|
||
|
|
threshold (5-hour usage window or weekly cap), the daemon transparently fails
|
||
|
|
over the workload to a backup account without losing in-flight session state.
|
||
|
|
|
||
|
|
It is the runtime glue behind the SecuAAS agent pool (`ccl-0`..`ccl-9`,
|
||
|
|
`ccl-auto-11`..`ccl-auto-20`) and is engineered to hold sessions warm across
|
||
|
|
account swaps by sharing the `~/.claude/projects/` transcript tree via
|
||
|
|
symlinks.
|
||
|
|
|
||
|
|
## Architecture (goroutines)
|
||
|
|
|
||
|
|
The daemon is a single Go binary composed of cooperating goroutines:
|
||
|
|
|
||
|
|
- **dispatcher** — reads `.agent-queue/inbox/*.md` across registered projects
|
||
|
|
and assigns tasks to idle sessions.
|
||
|
|
- **quota-monitor** — polls each configured Anthropic account's usage window
|
||
|
|
and triggers a failover when the active account crosses its threshold.
|
||
|
|
- **session-watcher** — tracks tmux session liveness (`ccl-*`), heartbeats,
|
||
|
|
and `.agent-queue/status.json` transitions (idle / working).
|
||
|
|
- **checkpoint** — periodically snapshots session context (current task,
|
||
|
|
last tool call, working dir) so an interrupted session can resume on a
|
||
|
|
different account.
|
||
|
|
- **janitor** — cleans stale `.dispatched` markers, archives old
|
||
|
|
`done/` tasks, prunes expired checkpoints.
|
||
|
|
- **notifier** — pushes state changes (failover fired, session degraded,
|
||
|
|
task failed) to Telegram / MCP dashboard / log aggregator.
|
||
|
|
- **account-switcher** — performs the actual swap: stop sessions on account
|
||
|
|
A, rehome symlinks, relaunch sessions on account B, replay last
|
||
|
|
checkpoint. Serialized via a single mutex so only one swap can happen at
|
||
|
|
a time.
|
||
|
|
|
||
|
|
All goroutines communicate through typed channels plus a shared state struct
|
||
|
|
behind a `sync.RWMutex`. The daemon exposes an HTTP control plane for the
|
||
|
|
MCP server to query status and force-trigger operations.
|
||
|
|
|
||
|
|
## Relationship to SecuAAS agent-orchestrator
|
||
|
|
|
||
|
|
This project extracts the session-management and failover logic that
|
||
|
|
currently lives in `dev-management/agent-orchestrator/` (shell scripts:
|
||
|
|
`launch-agent.sh`, `graceful-switch.sh`, `watchdog.sh`,
|
||
|
|
`checkpoint-daemon.sh`, `start-dedicated-agents.sh`) and reimplements it
|
||
|
|
as a single Go service. See the orchestrator docs for the operational
|
||
|
|
context this daemon is designed to replace.
|
||
|
|
|
||
|
|
## Repository layout
|
||
|
|
|
||
|
|
```
|
||
|
|
cmd/claude-failover/ Main entrypoint
|
||
|
|
docs/ Architecture, configuration, analysis notes
|
||
|
|
scripts/ Setup helpers (shared-projects symlink, etc.)
|
||
|
|
config.example.yaml Annotated example config
|
||
|
|
```
|
||
|
|
|
||
|
|
## Status
|
||
|
|
|
||
|
|
Pre-alpha. Design and scaffolding only — no working binary yet.
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
MIT — see `LICENSE`.
|