Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover
Find a file
Ubuntu 5cfb58c202 feat(dispatcher): enforce depends_on with .blocked marker (Phase 2/G2)
Before claiming a session for a task, the dispatcher now:
1. Parses the task's frontmatter
2. If `depends_on: [project:task_id]` is non-empty, checks each entry
   against `<projectsDir>/<project>/.agent-queue/done/<task_id>.md`
3. If any dep is unresolved -> skip the task and write
   `<task>.md.blocked` next to it. The watchdog (G1) will resolve
   this marker on its next tick.

The `.blocked` marker is idempotent: re-running the dispatcher does not
refresh its mtime, so the watchdog can compute the blocked-since
timestamp from the FIRST detection (timeout precision).

Path-traversal hardening: project / task_id segments must match
`[A-Za-z0-9._-]+` and cannot be `.` or `..`. A malicious frontmatter
like `depends_on: ../../tmp:foo` is rejected before any filesystem
lookup.

assignNextTask (the doneChan path) applies the same gate so that a
session freed mid-cycle cannot bypass enforcement.

Tests (-race clean):
- DependsOnUnresolved -> .blocked marker, no dispatch
- DependsOnResolved -> normal dispatch, no marker
- PartialResolution -> stay blocked
- RejectPathTraversal -> blocked, not dispatched
- BlockedMarker idempotent (mtime stable across passes)
- NoDependsOn regression guard
2026-04-16 20:30:17 +00:00
cmd/claude-failover feat(lifecycle): validate shared symlinks at daemon startup (A2) 2026-04-16 19:03:43 +00:00
docs docs: add WORK_IN_PROGRESS.md and document false-positive protection 2026-04-15 19:51:15 +00:00
internal feat(dispatcher): enforce depends_on with .blocked marker (Phase 2/G2) 2026-04-16 20:30:17 +00:00
scripts chore: add test-and-migrate.sh script 2026-04-15 01:12:49 +00:00
.gitignore chore(gitignore): ignore built binary and .security-reviewed marker 2026-04-15 00:00:23 +00:00
CLAUDE.md chore: add CLAUDE.md and update gitignore 2026-04-14 17:55:29 +00:00
config.example.yaml feat: Phase 2.7+3 — full integration, config update, systemd unit 2026-04-15 00:15:06 +00:00
go.mod feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
go.sum feat(dispatcher): Phase 2.2 — Task Dispatcher avec fsnotify 2026-04-14 20:30:08 +00:00
LICENSE feat: initial project structure 2026-04-14 13:29:24 +00:00
README.md feat: initial project structure 2026-04-14 13:29:24 +00:00
VERSION.md feat(dispatcher): enforce depends_on with .blocked marker (Phase 2/G2) 2026-04-16 20:30:17 +00:00
WORK_IN_PROGRESS.md feat(symlinks): add shared-state symlink manager (A1) 2026-04-16 18:55:32 +00:00

claude-failover

Go daemon for Claude Code multi-account session orchestration with automatic quota-based failover.

Overview

claude-failover orchestrates a pool of Claude Code sessions running under multiple Anthropic accounts. When the active account reaches its quota threshold (5-hour usage window or weekly cap), the daemon transparently fails over the workload to a backup account without losing in-flight session state.

It is the runtime glue behind the SecuAAS agent pool (ccl-0..ccl-9, ccl-auto-11..ccl-auto-20) and is engineered to hold sessions warm across account swaps by sharing the ~/.claude/projects/ transcript tree via symlinks.

Architecture (goroutines)

The daemon is a single Go binary composed of cooperating goroutines:

  • dispatcher — reads .agent-queue/inbox/*.md across registered projects and assigns tasks to idle sessions.
  • quota-monitor — polls each configured Anthropic account's usage window and triggers a failover when the active account crosses its threshold.
  • session-watcher — tracks tmux session liveness (ccl-*), heartbeats, and .agent-queue/status.json transitions (idle / working).
  • checkpoint — periodically snapshots session context (current task, last tool call, working dir) so an interrupted session can resume on a different account.
  • janitor — cleans stale .dispatched markers, archives old done/ tasks, prunes expired checkpoints.
  • notifier — pushes state changes (failover fired, session degraded, task failed) to Telegram / MCP dashboard / log aggregator.
  • account-switcher — performs the actual swap: stop sessions on account A, rehome symlinks, relaunch sessions on account B, replay last checkpoint. Serialized via a single mutex so only one swap can happen at a time.

All goroutines communicate through typed channels plus a shared state struct behind a sync.RWMutex. The daemon exposes an HTTP control plane for the MCP server to query status and force-trigger operations.

Relationship to SecuAAS agent-orchestrator

This project extracts the session-management and failover logic that currently lives in dev-management/agent-orchestrator/ (shell scripts: launch-agent.sh, graceful-switch.sh, watchdog.sh, checkpoint-daemon.sh, start-dedicated-agents.sh) and reimplements it as a single Go service. See the orchestrator docs for the operational context this daemon is designed to replace.

Repository layout

cmd/claude-failover/     Main entrypoint
docs/                    Architecture, configuration, analysis notes
scripts/                 Setup helpers (shared-projects symlink, etc.)
config.example.yaml      Annotated example config

Status

Pre-alpha. Design and scaffolding only — no working binary yet.

License

MIT — see LICENSE.