Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover
Find a file
Ubuntu 4cbdcf143a fix(dispatcher+watcher): never auto-dispatch into dedicated sessions
Observed: tasks from filesecure/.agent-queue/inbox and SecuScan/
.agent-queue/inbox were being routed into ccl-1-conformvault and
ccl-2-scanyze whenever those sessions happened to be idle. Those are
the operator's manual interactive Claude sessions, not dispatch
targets — the auto-dispatch was (a) hijacking a Claude instance the
operator was using and (b) triggering /exit via the watcher's
completion path when the side-task finished, kicking the operator out
mid-conversation.

findFreeSession was iterating Pool.Dedicated before the autonomous
pool, so any idle dedicated session was the first candidate.

- Dispatcher.findFreeSession: remove the Dedicated loop entirely.
  Auto-dispatch is now pool-only (ccl-auto-11..20).
- Watcher.completeSession: defense-in-depth — even if a dedicated
  session ever ends up in "working" state, it is no longer /exit'd;
  just marked idle. Pool /exit behaviour unchanged (context recycle).
- Tests: new TestFindFreeSessionSkipsDedicated proves the routing;
  3 existing tests rewritten to use the autonomous pool instead of
  relying on Dedicated as a fake pool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:30:26 +00:00
cmd/claude-failover feat: Phase 2.7+3 — full integration, config update, systemd unit 2026-04-15 00:15:06 +00:00
docs docs: add WORK_IN_PROGRESS.md and document false-positive protection 2026-04-15 19:51:15 +00:00
internal fix(dispatcher+watcher): never auto-dispatch into dedicated sessions 2026-04-16 13:30:26 +00:00
scripts chore: add test-and-migrate.sh script 2026-04-15 01:12:49 +00:00
.gitignore chore(gitignore): ignore built binary and .security-reviewed marker 2026-04-15 00:00:23 +00:00
CLAUDE.md chore: add CLAUDE.md and update gitignore 2026-04-14 17:55:29 +00:00
config.example.yaml feat: Phase 2.7+3 — full integration, config update, systemd unit 2026-04-15 00:15:06 +00:00
go.mod feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
go.sum feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
LICENSE feat: initial project structure 2026-04-14 13:29:24 +00:00
README.md feat: initial project structure 2026-04-14 13:29:24 +00:00
VERSION.md fix(dispatcher+watcher): never auto-dispatch into dedicated sessions 2026-04-16 13:30:26 +00:00
WORK_IN_PROGRESS.md feat(switcher): auto-resume dedicated sessions after a swap 2026-04-15 20:24:38 +00:00

claude-failover

Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover.

Overview

claude-failover orchestrates a pool of Claude Code sessions running under multiple Anthropic accounts. When the active account reaches its quota threshold (5-hour usage window or weekly cap), the daemon transparently fails over the workload to a backup account without losing in-flight session state.

It is the runtime glue behind the SecuAAS agent pool (ccl-0..ccl-9, ccl-auto-11..ccl-auto-20) and is engineered to hold sessions warm across account swaps by sharing the ~/.claude/projects/ transcript tree via symlinks.

Architecture (goroutines)

The daemon is a single Go binary composed of cooperating goroutines:

  • dispatcher — reads .agent-queue/inbox/*.md across registered projects and assigns tasks to idle sessions.
  • quota-monitor — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold.
  • session-watcher — tracks tmux session liveness (ccl-*), heartbeats, and .agent-queue/status.json transitions (idle / working).
  • checkpoint — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account.
  • janitor — cleans stale .dispatched markers, archives old done/ tasks, prunes expired checkpoints.
  • notifier — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator.
  • account-switcher — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time.

All goroutines communicate through typed channels plus a shared state struct behind a sync.RWMutex. The daemon exposes an HTTP control plane for the MCP server to query status and force-trigger operations.

Relationship to SecuAAS agent-orchestrator

This project extracts the session-management and failover logic that currently lives in dev-management/agent-orchestrator/ (shell scripts: launch-agent.sh, graceful-switch.sh, watchdog.sh, checkpoint-daemon.sh, start-dedicated-agents.sh) and reimplements it as a single Go service. See the orchestrator docs for the operational context this daemon is designed to replace.

Repository layout

cmd/claude-failover/     Main entrypoint
docs/                    Architecture, configuration, analysis notes
scripts/                 Setup helpers (shared-projects symlink, etc.)
config.example.yaml      Annotated example config

Status

Pre-alpha. Design and scaffolding only — no working binary yet.

License

MIT — see LICENSE.