feat: initial project structure

This commit is contained in:
Olivier 2026-04-14 13:29:24 +00:00
commit cf4957010f
10 changed files with 621 additions and 0 deletions

68
README.md Normal file
View file

@ -0,0 +1,68 @@
# claude-failover
Go daemon for Claude Code multi-account session orchestration with automatic
quota-based failover.
## Overview
`claude-failover` orchestrates a pool of Claude Code sessions running under
multiple Anthropic accounts. When the active account reaches its quota
threshold (5-hour usage window or weekly cap), the daemon transparently fails
over the workload to a backup account without losing in-flight session state.
It is the runtime glue behind the SecuAAS agent pool (`ccl-0`..`ccl-9`,
`ccl-auto-11`..`ccl-auto-20`) and is engineered to hold sessions warm across
account swaps by sharing the `~/.claude/projects/` transcript tree via
symlinks.
## Architecture (goroutines)
The daemon is a single Go binary composed of cooperating goroutines:
- **dispatcher** — reads `.agent-queue/inbox/*.md` across registered projects
and assigns tasks to idle sessions.
- **quota-monitor** — polls each configured Anthropic account's usage window
and triggers a failover when the active account crosses its threshold.
- **session-watcher** — tracks tmux session liveness (`ccl-*`), heartbeats,
and `.agent-queue/status.json` transitions (idle / working).
- **checkpoint** — periodically snapshots session context (current task,
last tool call, working dir) so an interrupted session can resume on a
different account.
- **janitor** — cleans stale `.dispatched` markers, archives old
`done/` tasks, prunes expired checkpoints.
- **notifier** — pushes state changes (failover fired, session degraded,
task failed) to Telegram / MCP dashboard / log aggregator.
- **account-switcher** — performs the actual swap: stop sessions on account
A, rehome symlinks, relaunch sessions on account B, replay last
checkpoint. Serialized via a single mutex so only one swap can happen at
a time.
All goroutines communicate through typed channels plus a shared state struct
behind a `sync.RWMutex`. The daemon exposes an HTTP control plane for the
MCP server to query status and force-trigger operations.
## Relationship to SecuAAS agent-orchestrator
This project extracts the session-management and failover logic that
currently lives in `dev-management/agent-orchestrator/` (shell scripts:
`launch-agent.sh`, `graceful-switch.sh`, `watchdog.sh`,
`checkpoint-daemon.sh`, `start-dedicated-agents.sh`) and reimplements it
as a single Go service. See the orchestrator docs for the operational
context this daemon is designed to replace.
## Repository layout
```
cmd/claude-failover/ Main entrypoint
docs/ Architecture, configuration, analysis notes
scripts/ Setup helpers (shared-projects symlink, etc.)
config.example.yaml Annotated example config
```
## Status
Pre-alpha. Design and scaffolding only — no working binary yet.
## License
MIT — see `LICENSE`.