docs: enrich architecture with Go implementation details
This commit is contained in:
parent
481e276d7d
commit
65c25c8955
1 changed files with 413 additions and 80 deletions
|
|
@ -6,93 +6,180 @@ through typed channels and a shared `State` struct guarded by a
|
|||
`sync.RWMutex`. A single-writer discipline is enforced: only the
|
||||
**account-switcher** may mutate the active-account field.
|
||||
|
||||
## Goroutines
|
||||
> This document enriches the high-level design with concrete Go
|
||||
> implementation notes distilled from the current shell-based
|
||||
> `agent-orchestrator` (see `graceful-switch.sh`, `launch-agent.sh`,
|
||||
> `setup-tmux.sh`, `start-dedicated-agents.sh`, `watchdog.sh`,
|
||||
> `checkpoint-daemon.sh`) and the companion TUI written in Go
|
||||
> (`agent-orchestrator/tui/*.go`). The daemon is the consolidation of
|
||||
> those scripts into a single, testable process.
|
||||
|
||||
### dispatcher
|
||||
---
|
||||
|
||||
Watches `.agent-queue/inbox/` for every registered project (inotify on
|
||||
Linux) and pairs each incoming task with an idle session from the pool.
|
||||
It respects:
|
||||
## 1. Process model
|
||||
|
||||
- per-project priority
|
||||
- agent capability tags declared in the task frontmatter
|
||||
- the `needs_claude_code: true` bypass flag
|
||||
- dispatcher-level cooldowns to avoid flooding a freshly-launched session
|
||||
- Single static binary: `claude-failover` (package main in `cmd/claude-failover`).
|
||||
- One OS process, many goroutines. No child daemons.
|
||||
- Runs under systemd (`User=ubuntu`, `Restart=always`).
|
||||
- Persists state at `/var/lib/claude-failover/state.json` (atomic write).
|
||||
- Listens locally on `127.0.0.1:8765` for MCP HTTP control.
|
||||
- Single YAML config (`/etc/claude-failover/config.yaml`), see
|
||||
[`config.example.yaml`](../config.example.yaml).
|
||||
|
||||
On successful assignment it renames `<task>.md` to `<task>.md.dispatched`
|
||||
and writes a pointer into the target session's tmux prompt.
|
||||
---
|
||||
|
||||
### quota-monitor
|
||||
## 2. Goroutines
|
||||
|
||||
Polls Anthropic usage counters for every configured account. Sources:
|
||||
Each goroutine is started from `main` with a `context.Context` derived
|
||||
from the root context. All goroutines obey cancellation and drain their
|
||||
pending work before returning.
|
||||
|
||||
1. Claude Code's local telemetry files under `~/.claude/statsig/` and
|
||||
`~/.claude/projects/*.jsonl` (message timestamps).
|
||||
2. Optional: a reverse-engineered `/api/quota` endpoint if available.
|
||||
### 2.1 dispatcher
|
||||
|
||||
It computes two sliding windows (5h, 1 week) and emits a `swap-requested`
|
||||
event once thresholds in the config are crossed.
|
||||
- Watches every registered project's `.agent-queue/inbox/` directory via
|
||||
[`fsnotify`](https://pkg.go.dev/github.com/fsnotify/fsnotify).
|
||||
- For each new `*.md` (ignoring `*.dispatched` and `*.dispatch-meta`),
|
||||
parses YAML frontmatter (title, priority, tags, `needs_claude_code`,
|
||||
`capabilities`).
|
||||
- Selects an idle session from the pool based on:
|
||||
- per-project priority,
|
||||
- agent capability tags,
|
||||
- `needs_claude_code: true` bypass (direct to Claude Code, skipping
|
||||
the GPU triage path inherited from the current dispatcher),
|
||||
- dispatcher-level cooldowns to avoid flooding a freshly-launched
|
||||
session.
|
||||
- On successful assignment, renames `<task>.md` to `<task>.md.dispatched`
|
||||
atomically and writes a pointer into the target session's tmux prompt
|
||||
via `TmuxAdapter.SendKeys`.
|
||||
- Emits a `TaskDispatched` event on the notifier channel.
|
||||
|
||||
### session-watcher
|
||||
Migration note: Phase 1 wraps the existing `smart-dispatcher.sh`
|
||||
behind `exec.Command` and only takes over the fsnotify layer; Phase 2
|
||||
re-implements dispatch selection natively in Go.
|
||||
|
||||
Keeps a table of tmux sessions (`ccl-*`). For each one it tracks:
|
||||
### 2.2 quota-monitor
|
||||
|
||||
- process liveness (via `tmux has-session`)
|
||||
- heartbeat timestamp from `.agent-queue/status.json`
|
||||
- current `state` field (idle / working / stalled)
|
||||
- Polls every account's usage counters on a configurable interval
|
||||
(`quota.poll_interval`, default `30s`).
|
||||
- Two data sources, merged:
|
||||
1. **Tmux pane scrape** - runs `tmux capture-pane -p -t <session>`
|
||||
and parses the "Claude usage" footer lines emitted by Claude Code.
|
||||
Implementation reuses the tmux wrapper from
|
||||
[`claude-squad/session/tmux`](https://github.com/smtg-ai/claude-squad/tree/main/session/tmux).
|
||||
2. **Local telemetry** - scans `~/.claude/statsig/` and
|
||||
`~/.claude/projects/*.jsonl` for message timestamps, computes
|
||||
sliding windows (5h, 1 week).
|
||||
- Also reads `SessionStart` / `SessionEnd` / `PreToolUse` hook events
|
||||
written by Claude Code into `~/.claude/hooks-audit.log` (opt-in via
|
||||
`settings.json`) for precise counters.
|
||||
- Emits `QuotaWarning` and `SwapRequested` events when thresholds in
|
||||
the config are crossed.
|
||||
|
||||
Stalled sessions (heartbeat older than N minutes while `state=working`)
|
||||
raise an alert on the notifier channel and become candidates for a
|
||||
forced restart.
|
||||
### 2.3 session-watcher
|
||||
|
||||
### checkpoint
|
||||
- Maintains an in-memory table keyed by tmux session name (`ccl-*`).
|
||||
- For each session tracks:
|
||||
- process liveness via `tmux has-session -t <name>` (exit code),
|
||||
- heartbeat timestamp parsed from the project's
|
||||
`.agent-queue/status.json`,
|
||||
- current `state` field (`idle` / `working` / `shell`),
|
||||
- last assigned task id and deadline.
|
||||
- Sessions whose heartbeat is older than `watchdog.stall_after` while
|
||||
`state == "working"` raise a `SessionStalled` alert and become
|
||||
candidates for a forced restart (via `account-switcher` or
|
||||
`launch-agent` re-run).
|
||||
- Replaces `watchdog.sh` entirely once Phase 2 ships.
|
||||
|
||||
Every `checkpoint.interval`, serializes per-session context:
|
||||
### 2.4 checkpoint
|
||||
|
||||
- current task id
|
||||
- last recorded tool call (name + truncated args)
|
||||
- cwd as reported by the session
|
||||
- the last N lines of the session's scrollback
|
||||
- On `checkpoint.interval` (default `30m`, a.k.a. the Anthropic 5h
|
||||
window / 10 checkpoints), serializes per-session context:
|
||||
- current task id,
|
||||
- last recorded tool call (name + truncated args),
|
||||
- cwd as reported by the session,
|
||||
- the last N lines of the session's scrollback
|
||||
(`tmux capture-pane -pS -<N>`).
|
||||
- Persists to SQLite at `checkpoint.db` (default
|
||||
`/var/lib/claude-failover/checkpoints.db`) with retention pruning
|
||||
based on the 30-minute rolling window.
|
||||
- Schema: `checkpoints(id INTEGER PK, session TEXT, ts INTEGER,
|
||||
task_id TEXT, cwd TEXT, tool_calls JSON, scrollback TEXT)`.
|
||||
- Indexed on `(session, ts DESC)`; old rows pruned by `janitor`.
|
||||
- Also exports a JSON snapshot per session to
|
||||
`checkpoint.dir/<session>/<timestamp>.json` for out-of-band tooling.
|
||||
|
||||
Files are written atomically (`*.tmp` + rename) to
|
||||
`checkpoint.dir/<session>/<timestamp>.json` and pruned to
|
||||
`checkpoint.keep` entries.
|
||||
### 2.5 janitor
|
||||
|
||||
### janitor
|
||||
Periodic housekeeping (runs every `janitor.interval`, default `5m`):
|
||||
|
||||
Periodic housekeeping:
|
||||
- removes stale `.md.dispatched` markers whose source task is gone,
|
||||
- archives `done/` entries older than `janitor.done_horizon`,
|
||||
- prunes checkpoint rows beyond `checkpoint.keep`,
|
||||
- rotates the daemon's own log file when it exceeds
|
||||
`log.max_size_mb`.
|
||||
|
||||
- removes stale `.md.dispatched` markers whose source task is gone
|
||||
- archives `done/` older than a configurable horizon
|
||||
- prunes expired checkpoints
|
||||
- rotates the daemon's own log file when it exceeds a size threshold
|
||||
### 2.6 notifier
|
||||
|
||||
### notifier
|
||||
- Fan-out of typed events (`SwapFired`, `SessionStalled`, `TaskFailed`,
|
||||
`QuotaWarning`, `TaskDispatched`) to configured sinks:
|
||||
- Telegram bot (alerts channel),
|
||||
- MCP control-plane push (HTTP callback),
|
||||
- structured log aggregator (stdout JSON).
|
||||
- Each sink is a goroutine with its own bounded buffered channel.
|
||||
Back-pressure is handled by dropping low-priority events
|
||||
(`TaskDispatched`) and logging a `NotifierSaturated` warning.
|
||||
|
||||
Fan-out of typed events (`SwapFired`, `SessionStalled`, `TaskFailed`,
|
||||
`QuotaWarning`) to configured sinks:
|
||||
### 2.7 account-switcher (state machine)
|
||||
|
||||
- Telegram bot (alerts channel)
|
||||
- MCP control-plane push
|
||||
- stdout / structured log aggregator
|
||||
The switcher is the only goroutine allowed to mutate `State.ActiveAccount`.
|
||||
It consumes `SwapRequested` events from a single-consumer channel and
|
||||
transitions through an atomic state machine:
|
||||
|
||||
### account-switcher
|
||||
```
|
||||
+--------+ SwapRequested +--------+
|
||||
| normal | ---------------> | saving |
|
||||
+--------+ +--------+
|
||||
^ |
|
||||
| v (checkpoint all sessions)
|
||||
| +-----------+
|
||||
| | switching |
|
||||
| +-----------+
|
||||
| |
|
||||
| v (relaunch tmux, flip HOME)
|
||||
| ResumeComplete +----------+
|
||||
+---------------------- | resuming |
|
||||
+----------+
|
||||
```
|
||||
|
||||
Serializes all account swaps behind a single mutex. Swap protocol:
|
||||
- `normal -> saving`: halt dispatcher intake, request an immediate
|
||||
checkpoint for every session, wait for ack with a deadline.
|
||||
- `saving -> switching`: kill tmux sessions in reverse launch order,
|
||||
repoint the `~/.claude` symlink (or per-session `HOME`) to the target
|
||||
account's home directory.
|
||||
- `switching -> resuming`: relaunch sessions via
|
||||
`TmuxAdapter.NewSession`; each session replays its most recent
|
||||
checkpoint (task pointer + cwd + truncated scrollback) so Claude Code
|
||||
reopens at the same task.
|
||||
- `resuming -> normal`: unblock dispatcher, start cooldown timer on
|
||||
the old account, emit `SwapFired`.
|
||||
|
||||
1. mark active account as `draining`
|
||||
2. tell each session to flush its current tool call and checkpoint
|
||||
3. stop tmux sessions in reverse launch order
|
||||
4. repoint the `~/.claude` symlink (or equivalent per-session HOME) to
|
||||
the target account's home directory
|
||||
5. relaunch sessions; replay the latest checkpoint so each session
|
||||
reopens the same project and task pointer
|
||||
6. mark the new account `active`, start the cooldown timer on the old one
|
||||
Guarantees:
|
||||
|
||||
See [`session-switch-analysis.md`](./session-switch-analysis.md) for why
|
||||
the shared-symlink approach is required (Claude Code bug #16103).
|
||||
- Only one swap in flight at a time (the channel has capacity 1, extra
|
||||
requests coalesce).
|
||||
- Swap is cancellable via `context.Context`; on cancel the switcher
|
||||
rolls back to the pre-swap `ActiveAccount`.
|
||||
- All FS writes go through `FSAdapter` and are `*.tmp + rename` to keep
|
||||
`state.json` recoverable.
|
||||
|
||||
## Shared state
|
||||
See [`session-switch-analysis.md`](./session-switch-analysis.md) for
|
||||
why the shared-symlink approach is required (Claude Code bug #16103)
|
||||
and how resuming interactive sessions differs from resuming autonomous
|
||||
ones.
|
||||
|
||||
---
|
||||
|
||||
## 3. Shared state
|
||||
|
||||
```go
|
||||
type State struct {
|
||||
|
|
@ -100,26 +187,272 @@ type State struct {
|
|||
ActiveAccount string
|
||||
Accounts map[string]*AccountState
|
||||
Sessions map[string]*SessionState
|
||||
Pool PoolState
|
||||
Dispatcher DispatcherState
|
||||
LastSwap time.Time
|
||||
PendingSwap bool
|
||||
StartedAt time.Time
|
||||
}
|
||||
|
||||
type AccountState struct {
|
||||
Name string
|
||||
Home string
|
||||
Priority int
|
||||
Usage5h float64 // 0.0 - 1.0
|
||||
UsageWeek float64
|
||||
CooldownEnd time.Time
|
||||
LastSwapIn time.Time
|
||||
LastSwapOut time.Time
|
||||
}
|
||||
|
||||
type SessionState struct {
|
||||
Name string
|
||||
Kind string // "autonomous" | "dedicated" | "interactive" | "orchestrator"
|
||||
Project string
|
||||
State string // "idle" | "working" | "shell" | "stalled"
|
||||
Task string
|
||||
AssignedAt time.Time
|
||||
Heartbeat time.Time
|
||||
}
|
||||
```
|
||||
|
||||
Readers take `RLock`; the account-switcher takes `Lock` for the duration
|
||||
of a swap. All other writers go through a single-writer channel owned by
|
||||
the switcher, which guarantees swap atomicity.
|
||||
- Readers take `RLock`; the account-switcher takes `Lock` for the
|
||||
duration of a swap.
|
||||
- All other writers enqueue on a single-writer channel
|
||||
(`stateOps chan func(*State)`) consumed by a dedicated goroutine.
|
||||
This mirrors the actor pattern and eliminates lock-ordering issues.
|
||||
- Every mutation triggers a debounced flush of `state.json`
|
||||
(`persist.debounce`, default `1s`) written atomically via
|
||||
`FSAdapter`.
|
||||
|
||||
## HTTP control plane
|
||||
---
|
||||
|
||||
The daemon exposes a small HTTP server (`mcp_http.listen`) consumed by
|
||||
the SecuAAS MCP gateway. Routes:
|
||||
## 4. Event system
|
||||
|
||||
Internal events are plain Go structs on typed channels. No reflection.
|
||||
|
||||
| Channel | Producer(s) | Consumer | Buffer |
|
||||
|-------------------|-----------------------|-------------------|--------|
|
||||
| `swapReq` | quota-monitor, HTTP | account-switcher | 1 |
|
||||
| `sessionEvt` | session-watcher | notifier, state | 32 |
|
||||
| `taskEvt` | dispatcher | notifier, state | 64 |
|
||||
| `checkpointEvt` | checkpoint | state | 8 |
|
||||
| `notifierSink[i]` | notifier | sink i | 128 |
|
||||
|
||||
Flow for a typical swap:
|
||||
|
||||
```
|
||||
quota-monitor --SwapRequested--> swapReq --> account-switcher
|
||||
| |
|
||||
| +--> state (lock)
|
||||
| +--> TmuxAdapter (kill/launch)
|
||||
| +--> FSAdapter (symlink flip)
|
||||
| +--> sessionEvt (SwapFired)
|
||||
v
|
||||
notifier --> telegram/mcp/log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Adapters (mockable interfaces)
|
||||
|
||||
### Tmux
|
||||
|
||||
Thin wrapper around `exec.Command("tmux", ...)`, mockable via interface:
|
||||
|
||||
```go
|
||||
type TmuxAdapter interface {
|
||||
HasSession(name string) (bool, error)
|
||||
NewSession(name, cwd string, env []string) error
|
||||
KillSession(name string) error
|
||||
SendKeys(name string, keys string) error
|
||||
CapturePane(name string, lines int) (string, error)
|
||||
ListSessions() ([]string, error)
|
||||
}
|
||||
```
|
||||
|
||||
- Default implementation: `ExecTmux` shelling out to the `tmux` binary.
|
||||
- Test implementation: `FakeTmux` backed by an in-memory map.
|
||||
- Inspired by
|
||||
[`smtg-ai/claude-squad/session/tmux`](https://github.com/smtg-ai/claude-squad/tree/main/session/tmux),
|
||||
which already battle-tests pane capture and send-keys escaping.
|
||||
|
||||
### Filesystem & Claude CLI
|
||||
|
||||
```go
|
||||
type FSAdapter interface {
|
||||
ReadFile(path string) ([]byte, error)
|
||||
WriteFileAtomic(path string, data []byte, perm os.FileMode) error
|
||||
Rename(oldp, newp string) error
|
||||
Stat(path string) (os.FileInfo, error)
|
||||
Watch(path string) (<-chan fsnotify.Event, error)
|
||||
}
|
||||
|
||||
type ClaudeCLIAdapter interface {
|
||||
ResumeSession(sessionID, cwd string) error
|
||||
InjectPrompt(sessionID, prompt string) error
|
||||
Quota() (Usage, error)
|
||||
}
|
||||
```
|
||||
|
||||
All three interfaces have `Fake*` implementations under
|
||||
`internal/testutil/` so every goroutine can be unit-tested without a
|
||||
real tmux, filesystem, or Claude CLI.
|
||||
|
||||
---
|
||||
|
||||
## 6. HTTP control plane (MCP)
|
||||
|
||||
Bound to `127.0.0.1:8765` (config `mcp_http.listen`). Never exposed
|
||||
publicly - the SecuAAS MCP gateway proxies from the host.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|-----------------------|--------------------------------|
|
||||
| GET | `/status` | Full state snapshot |
|
||||
| GET | `/accounts` | Account usage + limits |
|
||||
| GET | `/sessions` | Session table |
|
||||
| POST | `/trigger/swap` | Force failover (requires bearer) |
|
||||
| POST | `/trigger/dispatch` | Force inbox scan |
|
||||
|--------|-------------------------------|----------------------------------|
|
||||
| GET | `/api/v1/session_status` | Full state snapshot |
|
||||
| GET | `/api/v1/accounts` | Account usage + limits |
|
||||
| GET | `/api/v1/sessions` | Session table |
|
||||
| POST | `/api/v1/trigger_dispatch` | Force inbox scan |
|
||||
| POST | `/api/v1/trigger/swap` | Force failover (bearer required) |
|
||||
| GET | `/healthz` | Liveness probe |
|
||||
|
||||
All routes require the bearer token from `mcp_http.bearer_token_env`.
|
||||
- All mutating routes require the bearer token from
|
||||
`mcp_http.bearer_token_env`.
|
||||
- Read routes are unauthenticated but only reachable on loopback.
|
||||
- Handlers are thin: they publish on the same internal channels used
|
||||
by goroutines, then wait on a reply channel with a timeout.
|
||||
|
||||
---
|
||||
|
||||
## 7. Graceful shutdown
|
||||
|
||||
- `main` installs a `signal.Notify` for `SIGTERM`, `SIGINT`, `SIGHUP`.
|
||||
- On signal, the root `context.Context` is cancelled.
|
||||
- Goroutines drain in a defined order:
|
||||
1. HTTP server (`Shutdown(ctx)` with 5s deadline),
|
||||
2. dispatcher (stops accepting new tasks, finishes in-flight assignment),
|
||||
3. quota-monitor (cancels poll),
|
||||
4. session-watcher (flushes last state),
|
||||
5. checkpoint (forces one final checkpoint for every working session),
|
||||
6. account-switcher (refuses new swaps, completes an in-flight swap
|
||||
up to `shutdown.swap_deadline`),
|
||||
7. notifier (flushes bounded buffers),
|
||||
8. janitor (exits immediately).
|
||||
- `state.json` is flushed one last time before exit.
|
||||
- Exit code `0` on clean drain, `1` on deadline timeout (systemd will
|
||||
restart).
|
||||
|
||||
---
|
||||
|
||||
## 8. Configuration
|
||||
|
||||
Single YAML file. See [`config.example.yaml`](../config.example.yaml)
|
||||
for the authoritative schema. Key sections:
|
||||
|
||||
- `accounts[]` - ordered list of Anthropic accounts (name, home,
|
||||
limits, priority).
|
||||
- `pool` - dedicated sessions, autonomous pool prefix and min/max,
|
||||
shared projects dir.
|
||||
- `quota` - poll interval, 5h/weekly thresholds.
|
||||
- `checkpoint` - interval, dir, db, keep count.
|
||||
- `janitor` - interval, horizons.
|
||||
- `mcp_http` - listen addr, bearer token env var.
|
||||
- `notifier` - telegram, mcp callback, log sinks.
|
||||
|
||||
The config is loaded once at startup. `SIGHUP` triggers a re-parse;
|
||||
only safe fields (thresholds, intervals, notifier sinks) are
|
||||
hot-reloaded. Structural changes (accounts, pool) require a restart.
|
||||
|
||||
---
|
||||
|
||||
## 9. Migration plan
|
||||
|
||||
The existing orchestrator is a set of bash scripts under
|
||||
`agent-orchestrator/` (`graceful-switch.sh`, `launch-agent.sh`,
|
||||
`setup-tmux.sh`, `start-dedicated-agents.sh`, `watchdog.sh`,
|
||||
`checkpoint-daemon.sh`, `smart-dispatcher.sh`). Cutover is staged to
|
||||
minimize risk.
|
||||
|
||||
### Phase 1 - Daemon wrapper (weeks 1-2)
|
||||
|
||||
- Build `claude-failover` as a supervisor that `exec.Command`s the
|
||||
existing `.sh` scripts.
|
||||
- Goroutines are skeletons that call scripts:
|
||||
- dispatcher -> `smart-dispatcher.sh`,
|
||||
- account-switcher -> `graceful-switch.sh <target>`,
|
||||
- session-watcher -> parses `watchdog.sh --json` output,
|
||||
- checkpoint -> calls `checkpoint-daemon.sh --once`.
|
||||
- Shared state, event system, HTTP control plane, and config are
|
||||
already native Go.
|
||||
- Systemd unit replaces the existing per-script cron/tmux launchers.
|
||||
|
||||
### Phase 2 - Native rewrite (weeks 3-6)
|
||||
|
||||
- Replace each shell wrapper with a native Go implementation one
|
||||
goroutine at a time, in this order: session-watcher -> checkpoint ->
|
||||
dispatcher -> account-switcher.
|
||||
- Shell scripts remain as a fallback until each native path has run a
|
||||
full week without regression.
|
||||
|
||||
### Phase 3 - Removal (week 7+)
|
||||
|
||||
- Delete `agent-orchestrator/*.sh` once Phase 2 is green.
|
||||
- Keep the TUI (`agent-orchestrator/tui/`) as a read-only client of
|
||||
the daemon's HTTP API.
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing strategy
|
||||
|
||||
- Adapters (`TmuxAdapter`, `FSAdapter`, `ClaudeCLIAdapter`) are
|
||||
interfaces with in-memory fakes under `internal/testutil/`.
|
||||
- Unit tests cover every goroutine in isolation:
|
||||
- dispatcher: feed fake fsnotify events, assert tmux `SendKeys`
|
||||
calls.
|
||||
- quota-monitor: feed canned pane captures, assert `SwapRequested`.
|
||||
- account-switcher: drive the state machine through a full swap,
|
||||
including cancellation and rollback.
|
||||
- Integration tests boot the full daemon with fake adapters and a
|
||||
`httptest.Server` on the MCP port. A golden-file test compares
|
||||
`state.json` after a scripted sequence.
|
||||
- Race detector mandatory in CI (`go test -race ./...`).
|
||||
- An `e2e/` suite runs against a real tmux + real Claude Code CLI in a
|
||||
container, executed nightly.
|
||||
|
||||
---
|
||||
|
||||
## 11. Systemd unit
|
||||
|
||||
```ini
|
||||
# /etc/systemd/system/claude-failover.service
|
||||
[Unit]
|
||||
Description=Claude Code failover & orchestration daemon
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=ubuntu
|
||||
Group=ubuntu
|
||||
WorkingDirectory=/home/ubuntu
|
||||
Environment=CLAUDE_FAILOVER_CONFIG=/etc/claude-failover/config.yaml
|
||||
ExecStart=/usr/local/bin/claude-failover --config ${CLAUDE_FAILOVER_CONFIG}
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
Restart=always
|
||||
RestartSec=3
|
||||
TimeoutStopSec=30
|
||||
LimitNOFILE=65536
|
||||
# State & logs
|
||||
StateDirectory=claude-failover
|
||||
LogsDirectory=claude-failover
|
||||
# Hardening (tmux needs TTY access, keep minimal)
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/var/lib/claude-failover /var/log/claude-failover /home/ubuntu/.claude /home/ubuntu/projects
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
`Type=notify` requires the daemon to call `sd_notify(READY=1)` once all
|
||||
goroutines have registered and the HTTP server is listening. Use
|
||||
[`github.com/coreos/go-systemd/v22/daemon`](https://pkg.go.dev/github.com/coreos/go-systemd/v22/daemon).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue