feat: initial project structure
This commit is contained in:
commit
cf4957010f
10 changed files with 621 additions and 0 deletions
125
docs/architecture.md
Normal file
125
docs/architecture.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
# Architecture
|
||||
|
||||
`claude-failover` is a single Go binary structured as a set of cooperating
|
||||
goroutines. Each goroutine owns a narrow responsibility and communicates
|
||||
through typed channels and a shared `State` struct guarded by a
|
||||
`sync.RWMutex`. A single-writer discipline is enforced: only the
|
||||
**account-switcher** may mutate the active-account field.
|
||||
|
||||
## Goroutines
|
||||
|
||||
### dispatcher
|
||||
|
||||
Watches `.agent-queue/inbox/` for every registered project (inotify on
|
||||
Linux) and pairs each incoming task with an idle session from the pool.
|
||||
It respects:
|
||||
|
||||
- per-project priority
|
||||
- agent capability tags declared in the task frontmatter
|
||||
- the `needs_claude_code: true` bypass flag
|
||||
- dispatcher-level cooldowns to avoid flooding a freshly-launched session
|
||||
|
||||
On successful assignment it renames `<task>.md` to `<task>.md.dispatched`
|
||||
and writes a pointer into the target session's tmux prompt.
|
||||
|
||||
### quota-monitor
|
||||
|
||||
Polls Anthropic usage counters for every configured account. Sources:
|
||||
|
||||
1. Claude Code's local telemetry files under `~/.claude/statsig/` and
|
||||
`~/.claude/projects/*.jsonl` (message timestamps).
|
||||
2. Optional: a reverse-engineered `/api/quota` endpoint if available.
|
||||
|
||||
It computes two sliding windows (5h, 1 week) and emits a `swap-requested`
|
||||
event once thresholds in the config are crossed.
|
||||
|
||||
### session-watcher
|
||||
|
||||
Keeps a table of tmux sessions (`ccl-*`). For each one it tracks:
|
||||
|
||||
- process liveness (via `tmux has-session`)
|
||||
- heartbeat timestamp from `.agent-queue/status.json`
|
||||
- current `state` field (idle / working / stalled)
|
||||
|
||||
Stalled sessions (heartbeat older than N minutes while `state=working`)
|
||||
raise an alert on the notifier channel and become candidates for a
|
||||
forced restart.
|
||||
|
||||
### checkpoint
|
||||
|
||||
Every `checkpoint.interval`, serializes per-session context:
|
||||
|
||||
- current task id
|
||||
- last recorded tool call (name + truncated args)
|
||||
- cwd as reported by the session
|
||||
- the last N lines of the session's scrollback
|
||||
|
||||
Files are written atomically (`*.tmp` + rename) to
|
||||
`checkpoint.dir/<session>/<timestamp>.json` and pruned to
|
||||
`checkpoint.keep` entries.
|
||||
|
||||
### janitor
|
||||
|
||||
Periodic housekeeping:
|
||||
|
||||
- removes stale `.md.dispatched` markers whose source task is gone
|
||||
- archives `done/` older than a configurable horizon
|
||||
- prunes expired checkpoints
|
||||
- rotates the daemon's own log file when it exceeds a size threshold
|
||||
|
||||
### notifier
|
||||
|
||||
Fan-out of typed events (`SwapFired`, `SessionStalled`, `TaskFailed`,
|
||||
`QuotaWarning`) to configured sinks:
|
||||
|
||||
- Telegram bot (alerts channel)
|
||||
- MCP control-plane push
|
||||
- stdout / structured log aggregator
|
||||
|
||||
### account-switcher
|
||||
|
||||
Serializes all account swaps behind a single mutex. Swap protocol:
|
||||
|
||||
1. mark active account as `draining`
|
||||
2. tell each session to flush its current tool call and checkpoint
|
||||
3. stop tmux sessions in reverse launch order
|
||||
4. repoint the `~/.claude` symlink (or equivalent per-session HOME) to
|
||||
the target account's home directory
|
||||
5. relaunch sessions; replay the latest checkpoint so each session
|
||||
reopens the same project and task pointer
|
||||
6. mark the new account `active`, start the cooldown timer on the old one
|
||||
|
||||
See [`session-switch-analysis.md`](./session-switch-analysis.md) for why
|
||||
the shared-symlink approach is required (Claude Code bug #16103).
|
||||
|
||||
## Shared state
|
||||
|
||||
```go
|
||||
type State struct {
|
||||
mu sync.RWMutex
|
||||
ActiveAccount string
|
||||
Accounts map[string]*AccountState
|
||||
Sessions map[string]*SessionState
|
||||
LastSwap time.Time
|
||||
PendingSwap bool
|
||||
}
|
||||
```
|
||||
|
||||
Readers take `RLock`; the account-switcher takes `Lock` for the duration
|
||||
of a swap. All other writers go through a single-writer channel owned by
|
||||
the switcher, which guarantees swap atomicity.
|
||||
|
||||
## HTTP control plane
|
||||
|
||||
The daemon exposes a small HTTP server (`mcp_http.listen`) consumed by
|
||||
the SecuAAS MCP gateway. Routes:
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|-----------------------|--------------------------------|
|
||||
| GET | `/status` | Full state snapshot |
|
||||
| GET | `/accounts` | Account usage + limits |
|
||||
| GET | `/sessions` | Session table |
|
||||
| POST | `/trigger/swap` | Force failover (requires bearer) |
|
||||
| POST | `/trigger/dispatch` | Force inbox scan |
|
||||
|
||||
All routes require the bearer token from `mcp_http.bearer_token_env`.
|
||||
Loading…
Add table
Add a link
Reference in a new issue