Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover
Find a file
Ubuntu 8eaf0bbd35 feat(switcher): ensure shared symlinks on target home after flip (A3)
Wire symlinks.EnsureForAccount into executeSwitch, called immediately
after the ~/.claude flip. Guarantees the three shared-state links
(session-env, file-history, projects) exist on the target account home
even for freshly-provisioned accounts, preventing silent transcript
duplication and undo-history divergence on first resume.

Best-effort: errors are logged as WARN but never abort the swap. If we
returned here the daemon would be left inconsistent (symlink flipped,
SetActiveAccount never called). Operator sees the warning in logs and
resolves divergent links manually.

Tests:
- TestFlipReconcilesSharedSymlinksOnTargetHome: empty target home gets
  all three links pointing at canonical targets after the flip.
- TestFlipEnsureSymlinksFailureDoesNotAbortSwap: a planted divergent
  link triggers the symlinks-package error; the swap completes anyway
  and the active account is updated.

Hermetic: added AccountSwitcher.sharedSymlinks override so tests scope
the reconcile inside t.TempDir() and never touch
/home/ubuntu/.claude-*-shared. Existing tests migrated to this pattern
and hardcoded /tmp/claude-*-xxxx paths replaced with tmpdirs.

Phase 1 / Chantier A — task A3.
2026-04-16 19:34:03 +00:00
cmd/claude-failover feat(lifecycle): validate shared symlinks at daemon startup (A2) 2026-04-16 19:03:43 +00:00
docs docs: add WORK_IN_PROGRESS.md and document false-positive protection 2026-04-15 19:51:15 +00:00
internal feat(switcher): ensure shared symlinks on target home after flip (A3) 2026-04-16 19:34:03 +00:00
scripts chore: add test-and-migrate.sh script 2026-04-15 01:12:49 +00:00
.gitignore chore(gitignore): ignore built binary and .security-reviewed marker 2026-04-15 00:00:23 +00:00
CLAUDE.md chore: add CLAUDE.md and update gitignore 2026-04-14 17:55:29 +00:00
config.example.yaml feat: Phase 2.7+3 — full integration, config update, systemd unit 2026-04-15 00:15:06 +00:00
go.mod feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
go.sum feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
LICENSE feat: initial project structure 2026-04-14 13:29:24 +00:00
README.md feat: initial project structure 2026-04-14 13:29:24 +00:00
VERSION.md feat(switcher): ensure shared symlinks on target home after flip (A3) 2026-04-16 19:34:03 +00:00
WORK_IN_PROGRESS.md feat(symlinks): add shared-state symlink manager (A1) 2026-04-16 18:55:32 +00:00

claude-failover

Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover.

Overview

claude-failover orchestrates a pool of Claude Code sessions running under multiple Anthropic accounts. When the active account reaches its quota threshold (5-hour usage window or weekly cap), the daemon transparently fails over the workload to a backup account without losing in-flight session state.

It is the runtime glue behind the SecuAAS agent pool (ccl-0..ccl-9, ccl-auto-11..ccl-auto-20) and is engineered to hold sessions warm across account swaps by sharing the ~/.claude/projects/ transcript tree via symlinks.

Architecture (goroutines)

The daemon is a single Go binary composed of cooperating goroutines:

  • dispatcher — reads .agent-queue/inbox/*.md across registered projects and assigns tasks to idle sessions.
  • quota-monitor — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold.
  • session-watcher — tracks tmux session liveness (ccl-*), heartbeats, and .agent-queue/status.json transitions (idle / working).
  • checkpoint — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account.
  • janitor — cleans stale .dispatched markers, archives old done/ tasks, prunes expired checkpoints.
  • notifier — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator.
  • account-switcher — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time.

All goroutines communicate through typed channels plus a shared state struct behind a sync.RWMutex. The daemon exposes an HTTP control plane for the MCP server to query status and force-trigger operations.

Relationship to SecuAAS agent-orchestrator

This project extracts the session-management and failover logic that currently lives in dev-management/agent-orchestrator/ (shell scripts: launch-agent.sh, graceful-switch.sh, watchdog.sh, checkpoint-daemon.sh, start-dedicated-agents.sh) and reimplements it as a single Go service. See the orchestrator docs for the operational context this daemon is designed to replace.

Repository layout

cmd/claude-failover/     Main entrypoint
docs/                    Architecture, configuration, analysis notes
scripts/                 Setup helpers (shared-projects symlink, etc.)
config.example.yaml      Annotated example config

Status

Pre-alpha. Design and scaffolding only — no working binary yet.

License

MIT — see LICENSE.