Compare commits
6 commits
4cbdcf143a
...
47ab86eef9
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
47ab86eef9 | ||
|
|
bde4642931 | ||
|
|
20063b1939 | ||
|
|
8eaf0bbd35 | ||
|
|
e16e3526a0 | ||
|
|
91091d7abf |
8 changed files with 1017 additions and 8 deletions
159
VERSION.md
159
VERSION.md
|
|
@ -1,4 +1,161 @@
|
|||
# Version actuelle : 0.3.4
|
||||
# Version actuelle : 0.3.8
|
||||
|
||||
## [0.3.8] - 2026-04-16
|
||||
**Type:** Patch — Bug #1 (A3 flip+ensure inconsistency) + Bug #10 (requiredShared contract test)
|
||||
|
||||
### Corrigé — Bug #1 (CRITIQUE)
|
||||
- `AccountSwitcher.executeSwitch` ne continue plus silencieusement quand
|
||||
`symlinks.EnsureForAccount` échoue après le flip : il **roll-back** le lien
|
||||
`~/.claude` vers le home du compte précédent et **n'appelle pas**
|
||||
`SetActiveAccount`. Évite l'état incohérent où le daemon déclare le compte
|
||||
cible actif alors que ses shared symlinks sont divergents → transcripts
|
||||
dupliqués silencieusement, resume cassé.
|
||||
- Si le rollback réussit : swap annulé, état filesystem = état pré-swap,
|
||||
erreur explicite retournée par `executeSwitchE`.
|
||||
- Si ensure ET rollback échouent : `partialSwap` atomique sticky set,
|
||||
`ErrPartialSwap` retourné, tout futur swap est refusé tant que le
|
||||
daemon n'est pas redémarré par l'opérateur.
|
||||
- Nouvelle méthode publique `AccountSwitcher.IsPartialSwap() bool` pour
|
||||
que health-checks et watchdog exposent l'état dégradé.
|
||||
|
||||
### Ajouté — Tests Bug #1
|
||||
- `TestFlipEnsureFailureTriggersRollback` : plant un lien divergent sur
|
||||
le home cible → ensure échoue → rollback réussit → `ActiveAccount` reste
|
||||
compte1 → `~/.claude` pointe sur previousHome → `IsPartialSwap` = false.
|
||||
- `TestFlipEnsureAndRollbackFailure` : force les deux flips à échouer
|
||||
(homeDir = fichier régulier) → `ErrPartialSwap` retourné, flag sticky
|
||||
set, swap suivant refusé.
|
||||
|
||||
### Ajouté — Bug #10
|
||||
- `TestRequiredSharedIsCoherent` (`internal/symlinks/shared_test.go`) :
|
||||
valide le contrat de la constante package-level `RequiredShared` jamais
|
||||
exercée auparavant (tous les autres tests utilisent un override scoped
|
||||
en `t.TempDir()`). Vérifie sans toucher au filesystem :
|
||||
- exactement 3 entrées (`session-env`, `file-history`, `projects`)
|
||||
- targets absolus
|
||||
- `filepath.Dir(target)` identique pour les 3 entrées (invariant
|
||||
"3 liens sous un même shared root" sur lequel repose `EnsureForAccount`).
|
||||
|
||||
### Rationale
|
||||
- Continuer après un ensure échoué revient à valider que le compte cible
|
||||
est "sain" alors que les shared symlinks sont absents ou divergents.
|
||||
Conséquence en prod : premier `claude --resume` écrit dans
|
||||
`~/.claude/projects/` (privé) → transcripts dupliqués, undo
|
||||
désynchronisé, failover complètement cassé sans log d'alerte.
|
||||
- Le rollback garantit qu'un compte cible mal configuré ne peut PAS
|
||||
dégrader le state du daemon : on retourne à l'état pré-swap et on
|
||||
signale l'erreur à l'appelant.
|
||||
- `ErrPartialSwap` + `IsPartialSwap()` documente un état où l'intervention
|
||||
humaine est obligatoire — préférable à un retry automatique qui
|
||||
empirerait la divergence.
|
||||
|
||||
### Tests
|
||||
- ✅ `go test ./...` : tous les packages PASS
|
||||
- ✅ `go test -race ./...` : PASS, aucun data race
|
||||
- ✅ `go vet ./...` : clean
|
||||
- ✅ `go build ./...` : clean
|
||||
|
||||
### Fichiers modifiés
|
||||
- `internal/switcher/account_switcher.go` (+rollback + IsPartialSwap + ErrPartialSwap)
|
||||
- `internal/switcher/account_switcher_test.go` (2 nouveaux tests, 1 test obsolète remplacé)
|
||||
- `internal/symlinks/shared_test.go` (+TestRequiredSharedIsCoherent)
|
||||
|
||||
## [0.3.7] - 2026-04-16
|
||||
**Type:** Patch — Phase 1 / Chantier A3 : wire EnsureForAccount post-flip
|
||||
|
||||
### Ajouté
|
||||
- `AccountSwitcher.executeSwitch` appelle désormais
|
||||
`symlinks.EnsureForAccount(target.Home, ...)` **juste après** le flip
|
||||
du lien principal `~/.claude`. Garantit que les 3 liens partagés
|
||||
(`session-env`, `file-history`, `projects`) existent et pointent aux
|
||||
bons targets sur le compte cible, même si celui-ci vient juste
|
||||
d'être provisionné.
|
||||
- `AccountSwitcher.sharedSymlinks` : override test-only (accepte une
|
||||
liste `[]symlinks.SharedSymlink`). Défaut = `symlinks.RequiredShared`.
|
||||
Les tests peuvent scoper la réconciliation dans un `t.TempDir()` pour
|
||||
ne jamais toucher `/home/ubuntu/.claude-*-shared`.
|
||||
- 2 tests unitaires :
|
||||
- `TestFlipReconcilesSharedSymlinksOnTargetHome` : target home vide →
|
||||
les 3 liens sont créés après le flip et pointent aux targets canoniques.
|
||||
- `TestFlipEnsureSymlinksFailureDoesNotAbortSwap` : lien divergent
|
||||
planté à la main → `EnsureForAccount` renvoie une erreur, logguée
|
||||
en WARN, mais le swap complète quand même (best-effort post-flip).
|
||||
|
||||
### Rationale
|
||||
- Sans cet appel, un compte cible fraîchement provisionné n'aurait
|
||||
pas encore ses 3 liens ; au premier `claude --resume`, Claude Code
|
||||
écrirait dans `~/.claude/projects/` (privé) au lieu de
|
||||
`/home/ubuntu/.claude-projects-shared` → transcripts dupliqués,
|
||||
undo désynchronisé, resume silencieusement cassé.
|
||||
- L'ensure est **best-effort** : une erreur est logguée en WARN mais
|
||||
NE bloque PAS le flip. Si on abortait ici, on laisserait le daemon
|
||||
dans un état incohérent (symlink déjà flippé mais `SetActiveAccount`
|
||||
pas appelé).
|
||||
- L'opérateur voit le WARN dans les logs et peut corriger la
|
||||
divergence manuellement (ex: lien pointant sur le mauvais target).
|
||||
|
||||
### Tests
|
||||
- ✅ `go test ./...` : tous les packages PASS (incluant
|
||||
`internal/switcher` et `internal/symlinks`).
|
||||
- ✅ `go test -race ./internal/switcher/...` : PASS.
|
||||
- ✅ `go vet ./...` : clean.
|
||||
|
||||
### Fichiers modifiés
|
||||
- `internal/switcher/account_switcher.go`
|
||||
- `internal/switcher/account_switcher_test.go`
|
||||
|
||||
## [0.3.6] - 2026-04-16
|
||||
**Type:** Patch — Phase 1 / Chantier A2 : validation des symlinks au startup
|
||||
|
||||
### Ajouté
|
||||
- `Manager.ValidateSharedSymlinks()` : nouvelle méthode dans
|
||||
`internal/lifecycle` qui agrège les `Home` de tous les comptes
|
||||
configurés et délègue à `symlinks.ValidateAll`. Échoue dur si un
|
||||
compte n'a pas de `home` défini ou si un lien est absent/divergent.
|
||||
- `cmd/claude-failover/main.go` appelle cette validation **avant**
|
||||
`EnsureAllSessions()` : un état partagé cassé ne laissera plus le
|
||||
daemon démarrer et divergér silencieusement.
|
||||
|
||||
### Rationale
|
||||
- Un opérateur qui copie la config sur une nouvelle VM ne peut plus
|
||||
oublier les liens — le daemon refuse de démarrer jusqu'à ce qu'ils
|
||||
soient corrects.
|
||||
- Pas d'auto-heal sur divergence : on préfère un message d'erreur
|
||||
explicite à un `rm -f` silencieux qui détruirait l'autre compte.
|
||||
|
||||
### Tests
|
||||
- ✅ `go test ./...` : tous les packages PASS (incluant
|
||||
`internal/lifecycle` et `internal/symlinks`).
|
||||
|
||||
### Fichiers modifiés
|
||||
- `cmd/claude-failover/main.go` (+9)
|
||||
- `internal/lifecycle/manager.go` (+31)
|
||||
|
||||
## [0.3.5] - 2026-04-16
|
||||
**Type:** Patch — Phase 1 / Chantier A1 : package `internal/symlinks`
|
||||
|
||||
### Ajouté
|
||||
- `internal/symlinks/shared.go` : `EnsureForAccount` + `ValidateAll` qui
|
||||
encodent en code la convention des 3 symlinks partagés par compte
|
||||
(`session-env`, `file-history`, `projects`). Jusqu'à aujourd'hui ces
|
||||
liens étaient maintenus à la main et leur absence silencieuse cassait
|
||||
le failover (JSONL dupliqués, undo désynchronisé).
|
||||
- Tests unitaires couvrant : création missing, idempotence, divergence
|
||||
(refus d'auto-correction pour éviter la perte de données), fichier
|
||||
régulier à la place du lien, home vide, agrégation d'erreurs multi-comptes.
|
||||
|
||||
### Rationale
|
||||
- Un déploiement sur une nouvelle VM ne peut plus omettre les liens.
|
||||
- Divergent link → erreur explicite, jamais de correction silencieuse.
|
||||
- Préparation des tâches A2 (ValidateAll au startup) et A3 (EnsureForAccount
|
||||
post-flipSymlink dans le switcher).
|
||||
|
||||
### Tests
|
||||
- ✅ `go test ./internal/symlinks/...` : 9/9 PASS
|
||||
|
||||
### Fichiers ajoutés
|
||||
- `internal/symlinks/shared.go`
|
||||
- `internal/symlinks/shared_test.go`
|
||||
|
||||
## [0.3.4] - 2026-04-16
|
||||
**Type:** Patch — Dispatcher ne route JAMAIS vers les sessions dédiées
|
||||
|
|
|
|||
|
|
@ -1,13 +1,26 @@
|
|||
# Travaux en Cours - claude-failover
|
||||
|
||||
## Dernière mise à jour
|
||||
2026-04-15 19:30:00
|
||||
2026-04-16 19:00:00
|
||||
|
||||
## Version Actuelle
|
||||
0.3.0
|
||||
0.3.5 (en cours de progression vers 0.4.0)
|
||||
|
||||
## Demande Actuelle
|
||||
Aucune — v0.2.3 shippée, service stable.
|
||||
**Phase 1 / Chantier A — Failover robuste** (spec dans `ccl-platform/phases/phase1/A-failover.md`).
|
||||
Rendre le failover compte1 ↔ compte2 déterministe en intégrant dans le code les fixes manuels
|
||||
(symlinks partagés), en ajoutant un registre UUID fiable, et en durcissant tmux send-keys.
|
||||
|
||||
Branche : `feat/phase1-A-failover-robust`.
|
||||
|
||||
## Sous-tâches Chantier A
|
||||
- [x] A1 — `internal/symlinks/shared.go` (+ tests) — v0.3.5
|
||||
- [ ] A2 — `lifecycle/manager.go` : `ValidateAll` au startup
|
||||
- [ ] A3 — `switcher/account_switcher.go` : `EnsureForAccount` post-flip
|
||||
- [ ] A4 — `internal/registry/uuid_registry.go` (+ tests)
|
||||
- [ ] A5 — `internal/tmux/send.go` avec retry exponentiel (+ tests)
|
||||
- [ ] A6 — Capture UUID 200 → 500 lignes
|
||||
- [ ] A7 — `scripts/test-failover.sh` dans ccl-platform + scripts associés
|
||||
|
||||
## Étapes Complétées
|
||||
- [x] v0.2.1 — Cooldown post-swap + log forensique (trigger_session, pattern, snippet)
|
||||
|
|
|
|||
|
|
@ -51,6 +51,15 @@ func main() {
|
|||
// Initialise tmux client and lifecycle manager.
|
||||
tmuxClient := tmux.NewExecClient()
|
||||
lm := lifecycle.New(tmuxClient, s, cfg)
|
||||
|
||||
// Validate (and self-heal) the shared-state symlinks BEFORE spawning
|
||||
// any sessions. A divergent link would silently fork transcripts
|
||||
// between accounts and make failover destructive, so we fail fast here
|
||||
// rather than after work is in flight.
|
||||
if err := lm.ValidateSharedSymlinks(); err != nil {
|
||||
log.Fatalf("shared symlinks validation failed: %v", err)
|
||||
}
|
||||
|
||||
lm.EnsureAllSessions()
|
||||
|
||||
// Block until SIGINT or SIGTERM.
|
||||
|
|
|
|||
|
|
@ -4,11 +4,13 @@ package lifecycle
|
|||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/config"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/state"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/symlinks"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/tmux"
|
||||
)
|
||||
|
||||
|
|
@ -47,6 +49,35 @@ func (m *Manager) Run(ctx context.Context) {
|
|||
}
|
||||
}
|
||||
|
||||
// ValidateSharedSymlinks verifies that every configured account home has
|
||||
// the three shared-state symlinks (session-env, file-history, projects)
|
||||
// in place and pointing at the canonical shared targets.
|
||||
//
|
||||
// Called once at daemon startup BEFORE sessions are recreated. A missing
|
||||
// or divergent link would silently fork the state tree between the two
|
||||
// accounts, breaking failover. We fail fast so the operator fixes it
|
||||
// before any work is in flight.
|
||||
//
|
||||
// EnsureForAccount creates missing links but refuses to touch divergent
|
||||
// ones — see internal/symlinks for the rationale.
|
||||
func (m *Manager) ValidateSharedSymlinks() error {
|
||||
if len(m.config.Accounts) == 0 {
|
||||
return fmt.Errorf("[lifecycle] no accounts configured — cannot validate shared symlinks")
|
||||
}
|
||||
homes := make([]string, 0, len(m.config.Accounts))
|
||||
for _, acc := range m.config.Accounts {
|
||||
if acc.Home == "" {
|
||||
return fmt.Errorf("[lifecycle] account %q has empty home — refusing to continue", acc.Name)
|
||||
}
|
||||
homes = append(homes, acc.Home)
|
||||
}
|
||||
if err := symlinks.ValidateAll(homes, symlinks.RequiredShared); err != nil {
|
||||
return fmt.Errorf("shared symlinks invalid, refusing to start: %w", err)
|
||||
}
|
||||
m.logger.Printf("[lifecycle] shared symlinks OK for %d account(s)", len(homes))
|
||||
return nil
|
||||
}
|
||||
|
||||
// EnsureAllSessions creates all configured sessions that are not yet present in tmux.
|
||||
// It is intended to be called once at daemon startup before Run is launched.
|
||||
func (m *Manager) EnsureAllSessions() {
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ package switcher
|
|||
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"fmt"
|
||||
"log"
|
||||
"os"
|
||||
|
|
@ -11,15 +12,27 @@ import (
|
|||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/config"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/notify"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/quota"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/state"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/symlinks"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/tmux"
|
||||
)
|
||||
|
||||
// ErrPartialSwap is returned (and wrapped) when the switcher flipped
|
||||
// ~/.claude to the target home, EnsureForAccount failed on the target,
|
||||
// and the rollback flip back to the previous home ALSO failed. The daemon
|
||||
// is in a documented degraded state: the active-account setter was NOT
|
||||
// called, but the filesystem symlink may point at an account whose shared
|
||||
// state is inconsistent. Operator intervention is required. Callers can
|
||||
// interrogate AccountSwitcher.IsPartialSwap() to expose the flag to
|
||||
// health-checks / watchdogs.
|
||||
var ErrPartialSwap = errors.New("switcher: partial swap — flip succeeded but ensure + rollback both failed")
|
||||
|
||||
// SwitchState represents the current phase of a failover operation.
|
||||
type SwitchState string
|
||||
|
||||
|
|
@ -52,6 +65,19 @@ type AccountSwitcher struct {
|
|||
// homeDir is the directory containing the .claude symlink. Overridable for tests.
|
||||
// When empty, os.UserHomeDir() is used.
|
||||
homeDir string
|
||||
// sharedSymlinks is the list of shared-state links reconciled on the
|
||||
// target account home after every flip. Overridable for tests so the
|
||||
// suite never touches the operator's real /home/ubuntu/.claude-*
|
||||
// shared directories. When nil, symlinks.RequiredShared is used.
|
||||
sharedSymlinks []symlinks.SharedSymlink
|
||||
// partialSwap is set to 1 when a flip+ensure+rollback sequence left the
|
||||
// daemon in an inconsistent state (symlink possibly flipped, but active
|
||||
// account NOT updated, and rollback flip ALSO failed). Health-checks /
|
||||
// watchdogs read this flag via IsPartialSwap(). It is sticky: once set,
|
||||
// it stays set until the operator restarts the daemon after fixing the
|
||||
// filesystem state. We use atomic access so watchdog goroutines can read
|
||||
// it without blocking the switcher.
|
||||
partialSwap atomic.Bool
|
||||
}
|
||||
|
||||
// New creates an AccountSwitcher.
|
||||
|
|
@ -88,8 +114,30 @@ func (a *AccountSwitcher) Run(ctx context.Context) {
|
|||
|
||||
// executeSwitch performs the full failover sequence.
|
||||
func (a *AccountSwitcher) executeSwitch(req quota.SwitchRequest) {
|
||||
if err := a.executeSwitchE(req); err != nil {
|
||||
// executeSwitchE already logs the detail; we swallow the error here
|
||||
// because the public Run loop has no return channel. The partialSwap
|
||||
// flag (if set) remains visible via IsPartialSwap().
|
||||
a.logger.Printf("[switcher] SWAP aborted: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// executeSwitchE runs the swap and returns an error describing any abort or
|
||||
// partial-swap condition. Split out from executeSwitch so tests can assert
|
||||
// on the error value without routing through a channel.
|
||||
func (a *AccountSwitcher) executeSwitchE(req quota.SwitchRequest) error {
|
||||
a.logger.Printf("[switcher] SWAP initiated from=%q reset=%q", req.From, req.ResetTime)
|
||||
|
||||
// Refuse to proceed if a previous swap left the daemon in an
|
||||
// inconsistent state. The operator must intervene (fix the filesystem,
|
||||
// restart the daemon) before any further failover can be attempted —
|
||||
// otherwise we'd stack symlink flips on top of a broken state.
|
||||
if a.partialSwap.Load() {
|
||||
err := fmt.Errorf("refusing swap: daemon is in partial-swap degraded state (operator intervention required)")
|
||||
a.logger.Printf("[switcher] %v", err)
|
||||
return err
|
||||
}
|
||||
|
||||
// 1. SAVING — capture resume UUIDs from all working sessions plus
|
||||
// every dedicated session unconditionally (dedicated sessions are
|
||||
// user-driven and may not be tracked as "working" in state, but their
|
||||
|
|
@ -104,12 +152,52 @@ func (a *AccountSwitcher) executeSwitch(req quota.SwitchRequest) {
|
|||
if target == nil {
|
||||
a.logger.Printf("[switcher] no alternate account found for %q — aborting swap", req.From)
|
||||
a.currentState = StateNormal
|
||||
return
|
||||
return nil
|
||||
}
|
||||
previous := a.findAccountByName(req.From)
|
||||
|
||||
if err := a.flipSymlink(target.Home); err != nil {
|
||||
a.logger.Printf("[switcher] flipSymlink error: %v", err)
|
||||
}
|
||||
// Ensure the target account home exposes the three shared-state
|
||||
// symlinks (session-env, file-history, projects). If this fails we
|
||||
// MUST NOT proceed with SetActiveAccount — the daemon would otherwise
|
||||
// declare the target "active" while its shared state is divergent,
|
||||
// silently writing transcripts into private /projects directories and
|
||||
// breaking `claude --resume` across sessions. Instead we attempt to
|
||||
// roll back the ~/.claude flip to the previous account. If the
|
||||
// rollback also fails, the daemon is in a documented degraded state
|
||||
// (ErrPartialSwap) and the operator must intervene.
|
||||
if err := symlinks.EnsureForAccount(target.Home, a.requiredShared()); err != nil {
|
||||
a.logger.Printf("[switcher] ensure shared symlinks for %q failed: %v — attempting rollback", target.Home, err)
|
||||
if previous == nil || previous.Home == "" {
|
||||
// No known previous home to roll back to — set the degraded
|
||||
// flag and bail out. This is equivalent to a rollback failure
|
||||
// because the filesystem is pointed at a broken target.
|
||||
a.partialSwap.Store(true)
|
||||
a.currentState = StateNormal
|
||||
return fmt.Errorf("%w: ensure failed (%v) and no previous account home is known for rollback", ErrPartialSwap, err)
|
||||
}
|
||||
if rbErr := a.flipSymlink(previous.Home); rbErr != nil {
|
||||
// Both the ensure AND the rollback failed. The daemon is now
|
||||
// in a documented inconsistent state: ~/.claude may point at
|
||||
// target whose shared-state is divergent, but SetActiveAccount
|
||||
// has NOT been called so state.ActiveAccount is still the
|
||||
// previous account. No further failover can be attempted
|
||||
// until the operator intervenes.
|
||||
a.partialSwap.Store(true)
|
||||
a.logger.Printf("[switcher] CRITICAL partial swap: ensure=%v rollback=%v — daemon in degraded state, operator intervention required", err, rbErr)
|
||||
a.currentState = StateNormal
|
||||
return fmt.Errorf("%w: ensure=%v rollback=%v", ErrPartialSwap, err, rbErr)
|
||||
}
|
||||
// Rollback succeeded — symlink is back on the previous account,
|
||||
// SetActiveAccount was NEVER called, state is consistent with
|
||||
// "no swap happened". Return an explicit error so the caller
|
||||
// knows the swap was cancelled.
|
||||
a.logger.Printf("[switcher] rollback successful: ~/.claude → %s (swap cancelled)", previous.Home)
|
||||
a.currentState = StateNormal
|
||||
return fmt.Errorf("swap cancelled: ensure shared symlinks failed on target %q: %w", target.Home, err)
|
||||
}
|
||||
a.killAllPoolSessions()
|
||||
a.recreatePoolSessions()
|
||||
a.relaunchDedicatedSessions(target.Home)
|
||||
|
|
@ -135,6 +223,31 @@ func (a *AccountSwitcher) executeSwitch(req quota.SwitchRequest) {
|
|||
}
|
||||
|
||||
a.currentState = StateNormal
|
||||
return nil
|
||||
}
|
||||
|
||||
// IsPartialSwap reports whether the switcher is in a degraded state after a
|
||||
// flip+ensure+rollback sequence all failed. Health-checks and watchdogs use
|
||||
// this signal to surface an operator-actionable alert. The flag is sticky
|
||||
// for the lifetime of the process: once set, it remains set until the daemon
|
||||
// is restarted (after the operator has fixed the filesystem).
|
||||
func (a *AccountSwitcher) IsPartialSwap() bool {
|
||||
return a.partialSwap.Load()
|
||||
}
|
||||
|
||||
// findAccountByName returns the account config entry matching name, or nil.
|
||||
// Unlike findTargetAccount (which returns the first NON-matching account),
|
||||
// this is used by the rollback path to recover the previous home.
|
||||
func (a *AccountSwitcher) findAccountByName(name string) *config.AccountConfig {
|
||||
if name == "" {
|
||||
return nil
|
||||
}
|
||||
for i := range a.config.Accounts {
|
||||
if a.config.Accounts[i].Name == name {
|
||||
return &a.config.Accounts[i]
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// saveDedicatedUUIDs captures the resume UUID for every configured dedicated
|
||||
|
|
@ -225,6 +338,16 @@ func (a *AccountSwitcher) saveAllSessions() {
|
|||
})
|
||||
}
|
||||
|
||||
// requiredShared returns the shared-symlink list used when reconciling the
|
||||
// target account home after a flip. Tests may set a.sharedSymlinks to a
|
||||
// tmpdir-scoped list so they never touch /home/ubuntu/.claude-*-shared.
|
||||
func (a *AccountSwitcher) requiredShared() []symlinks.SharedSymlink {
|
||||
if a.sharedSymlinks != nil {
|
||||
return a.sharedSymlinks
|
||||
}
|
||||
return symlinks.RequiredShared
|
||||
}
|
||||
|
||||
// resolveHomeDir returns the configured homeDir (test override) or the real
|
||||
// user home. Tests MUST set a.homeDir to a tmpdir to avoid clobbering the
|
||||
// production ~/.claude symlink.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,9 @@
|
|||
package switcher
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
|
@ -8,8 +11,19 @@ import (
|
|||
"forge.secuaas.ovh/olivier/claude-failover/internal/config"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/quota"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/state"
|
||||
"forge.secuaas.ovh/olivier/claude-failover/internal/symlinks"
|
||||
)
|
||||
|
||||
// tmpShared returns a SharedSymlink list whose targets live entirely under
|
||||
// tmpDir, so switcher tests never touch /home/ubuntu/.claude-*-shared.
|
||||
func tmpShared(tmpDir string) []symlinks.SharedSymlink {
|
||||
return []symlinks.SharedSymlink{
|
||||
{Target: filepath.Join(tmpDir, "session-env-shared"), Name: "session-env"},
|
||||
{Target: filepath.Join(tmpDir, "file-history-shared"), Name: "file-history"},
|
||||
{Target: filepath.Join(tmpDir, "projects-shared"), Name: "projects"},
|
||||
}
|
||||
}
|
||||
|
||||
// mockTmux for switcher tests.
|
||||
type mockTmux struct {
|
||||
sessions map[string]bool
|
||||
|
|
@ -143,6 +157,9 @@ func TestKillAndRecreatePoolSessions(t *testing.T) {
|
|||
// touches the real ~/.claude (regression: a reboot used to leave Claude
|
||||
// Code unusable because the test had repointed ~/.claude to /tmp/...).
|
||||
a.homeDir = t.TempDir()
|
||||
// Scope shared-symlink targets to a tmpdir so the post-flip ensure
|
||||
// pass does not write inside /home/ubuntu/.claude-*-shared.
|
||||
a.sharedSymlinks = tmpShared(t.TempDir())
|
||||
a.executeSwitch(quota.SwitchRequest{From: "compte1"})
|
||||
|
||||
// Active account must have changed.
|
||||
|
|
@ -186,10 +203,12 @@ func TestDedicatedRelaunchAfterSwap(t *testing.T) {
|
|||
s := state.New("")
|
||||
s.SetActiveAccount("compte1")
|
||||
|
||||
home1 := filepath.Join(t.TempDir(), "claude-1-xxxx")
|
||||
home2 := filepath.Join(t.TempDir(), "claude-2-xxxx")
|
||||
cfg := &config.Config{
|
||||
Accounts: []config.AccountConfig{
|
||||
{Name: "compte1", Home: "/tmp/claude-1-xxxx"},
|
||||
{Name: "compte2", Home: "/tmp/claude-2-xxxx"},
|
||||
{Name: "compte1", Home: home1},
|
||||
{Name: "compte2", Home: home2},
|
||||
},
|
||||
Pool: config.PoolConfig{
|
||||
Dedicated: []config.DedicatedSession{{Name: "dedicated-1", Project: "/tmp"}},
|
||||
|
|
@ -199,6 +218,7 @@ func TestDedicatedRelaunchAfterSwap(t *testing.T) {
|
|||
|
||||
a := New(tc, s, cfg, make(chan quota.SwitchRequest), nil)
|
||||
a.homeDir = t.TempDir()
|
||||
a.sharedSymlinks = tmpShared(t.TempDir())
|
||||
a.executeSwitch(quota.SwitchRequest{From: "compte1"})
|
||||
|
||||
// The relaunch must send a resume command on the dedicated session,
|
||||
|
|
@ -213,10 +233,225 @@ func TestDedicatedRelaunchAfterSwap(t *testing.T) {
|
|||
if relaunch == "" {
|
||||
t.Fatalf("expected dedicated-1 relaunch send-keys; got %v", tc.sendKeyCalls)
|
||||
}
|
||||
if !strings.Contains(relaunch, "CLAUDE_CONFIG_DIR=/tmp/claude-2-xxxx") {
|
||||
if !strings.Contains(relaunch, "CLAUDE_CONFIG_DIR="+home2) {
|
||||
t.Errorf("relaunch should set CLAUDE_CONFIG_DIR to target home; got %q", relaunch)
|
||||
}
|
||||
if !strings.Contains(relaunch, "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee") {
|
||||
t.Errorf("relaunch should include captured UUID; got %q", relaunch)
|
||||
}
|
||||
}
|
||||
|
||||
// TestFlipReconcilesSharedSymlinksOnTargetHome verifies A3: after the main
|
||||
// ~/.claude flip, the switcher reconciles the three shared-state symlinks
|
||||
// (session-env / file-history / projects) on the TARGET account home.
|
||||
// Scenario: the target home has NO links yet — a freshly-provisioned account
|
||||
// that has never been flipped into. Post-switch, all three links must exist
|
||||
// inside the target home and point at the canonical shared targets.
|
||||
func TestFlipReconcilesSharedSymlinksOnTargetHome(t *testing.T) {
|
||||
tc := newMockTmux()
|
||||
|
||||
s := state.New("")
|
||||
s.SetActiveAccount("compte1")
|
||||
|
||||
// Target home starts empty: EnsureForAccount will mkdir + create links.
|
||||
targetHome := filepath.Join(t.TempDir(), "claude-compte2")
|
||||
cfg := &config.Config{
|
||||
Accounts: []config.AccountConfig{
|
||||
{Name: "compte1", Home: filepath.Join(t.TempDir(), "claude-compte1")},
|
||||
{Name: "compte2", Home: targetHome},
|
||||
},
|
||||
Pool: config.PoolConfig{
|
||||
Autonomous: config.AutonomousConfig{Prefix: "ccl-auto-", Min: 0, Max: 0},
|
||||
},
|
||||
}
|
||||
|
||||
a := New(tc, s, cfg, make(chan quota.SwitchRequest), nil)
|
||||
a.homeDir = t.TempDir()
|
||||
shared := tmpShared(t.TempDir())
|
||||
a.sharedSymlinks = shared
|
||||
|
||||
// Pre-assert: no link exists in targetHome.
|
||||
for _, sl := range shared {
|
||||
if _, err := os.Lstat(filepath.Join(targetHome, sl.Name)); !os.IsNotExist(err) {
|
||||
t.Fatalf("pre-condition: %q should not exist yet (err=%v)", sl.Name, err)
|
||||
}
|
||||
}
|
||||
|
||||
a.executeSwitch(quota.SwitchRequest{From: "compte1"})
|
||||
|
||||
// Post-assert: every required link exists and points at the canonical
|
||||
// target under the tmpdir-scoped shared root.
|
||||
for _, sl := range shared {
|
||||
linkPath := filepath.Join(targetHome, sl.Name)
|
||||
info, err := os.Lstat(linkPath)
|
||||
if err != nil {
|
||||
t.Errorf("expected link at %s after flip: %v", linkPath, err)
|
||||
continue
|
||||
}
|
||||
if info.Mode()&os.ModeSymlink == 0 {
|
||||
t.Errorf("%s exists but is not a symlink", linkPath)
|
||||
continue
|
||||
}
|
||||
got, err := os.Readlink(linkPath)
|
||||
if err != nil {
|
||||
t.Errorf("readlink %s: %v", linkPath, err)
|
||||
continue
|
||||
}
|
||||
if got != sl.Target {
|
||||
t.Errorf("link %s points to %q, want %q", linkPath, got, sl.Target)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestFlipEnsureFailureTriggersRollback verifies the fix for the A3 bug
|
||||
// (flip+ensure inconsistency): if EnsureForAccount fails on the target home
|
||||
// after the ~/.claude flip, the switcher MUST NOT mark the target account
|
||||
// active. It must instead roll back the ~/.claude symlink to the previous
|
||||
// account's home, leaving the daemon in the pre-swap state so subsequent
|
||||
// session work keeps writing to the known-good shared state.
|
||||
//
|
||||
// Old (buggy) behaviour: ensure error was WARN-only, SetActiveAccount still
|
||||
// happened, dedicated sessions were relaunched against a target whose
|
||||
// /projects, /session-env, /file-history were missing or divergent →
|
||||
// transcripts duplicated silently, resume broke, undo history diverged.
|
||||
func TestFlipEnsureFailureTriggersRollback(t *testing.T) {
|
||||
tc := newMockTmux()
|
||||
|
||||
s := state.New("")
|
||||
s.SetActiveAccount("compte1")
|
||||
|
||||
previousHome := filepath.Join(t.TempDir(), "claude-compte1")
|
||||
targetHome := filepath.Join(t.TempDir(), "claude-compte2")
|
||||
if err := os.MkdirAll(previousHome, 0700); err != nil {
|
||||
t.Fatalf("mkdir previous home: %v", err)
|
||||
}
|
||||
if err := os.MkdirAll(targetHome, 0700); err != nil {
|
||||
t.Fatalf("mkdir target home: %v", err)
|
||||
}
|
||||
// Plant a divergent link at <targetHome>/session-env. The symlinks
|
||||
// package refuses to auto-correct this (data-loss safeguard) and will
|
||||
// return an error, which must now trigger a rollback.
|
||||
bogus := filepath.Join(t.TempDir(), "somewhere-else")
|
||||
if err := os.MkdirAll(bogus, 0700); err != nil {
|
||||
t.Fatalf("mkdir bogus: %v", err)
|
||||
}
|
||||
if err := os.Symlink(bogus, filepath.Join(targetHome, "session-env")); err != nil {
|
||||
t.Fatalf("plant divergent link: %v", err)
|
||||
}
|
||||
|
||||
cfg := &config.Config{
|
||||
Accounts: []config.AccountConfig{
|
||||
{Name: "compte1", Home: previousHome},
|
||||
{Name: "compte2", Home: targetHome},
|
||||
},
|
||||
Pool: config.PoolConfig{
|
||||
Autonomous: config.AutonomousConfig{Prefix: "ccl-auto-", Min: 0, Max: 0},
|
||||
},
|
||||
}
|
||||
|
||||
a := New(tc, s, cfg, make(chan quota.SwitchRequest), nil)
|
||||
homeDir := t.TempDir()
|
||||
a.homeDir = homeDir
|
||||
a.sharedSymlinks = tmpShared(t.TempDir())
|
||||
|
||||
err := a.executeSwitchE(quota.SwitchRequest{From: "compte1"})
|
||||
if err == nil {
|
||||
t.Fatalf("executeSwitchE: expected cancellation error, got nil")
|
||||
}
|
||||
// The public symmetric swap-cancelled error must mention ensure and
|
||||
// wrap the underlying symlinks package message. ErrPartialSwap must
|
||||
// NOT be set (rollback succeeded → recoverable condition).
|
||||
if errors.Is(err, ErrPartialSwap) {
|
||||
t.Errorf("did not expect ErrPartialSwap; rollback succeeded; got %v", err)
|
||||
}
|
||||
if a.IsPartialSwap() {
|
||||
t.Errorf("IsPartialSwap should be false when rollback succeeds")
|
||||
}
|
||||
|
||||
// Active account must remain the previous one — SetActiveAccount must
|
||||
// NOT have been called.
|
||||
if got := s.ActiveAccount(); got != "compte1" {
|
||||
t.Errorf("active account should stay compte1 after rollback; got %q", got)
|
||||
}
|
||||
|
||||
// ~/.claude must now point at the previous home (rollback target).
|
||||
link, rlErr := os.Readlink(filepath.Join(homeDir, ".claude"))
|
||||
if rlErr != nil {
|
||||
t.Fatalf("readlink ~/.claude: %v", rlErr)
|
||||
}
|
||||
if link != previousHome {
|
||||
t.Errorf("~/.claude should point at previous home %q after rollback; got %q", previousHome, link)
|
||||
}
|
||||
}
|
||||
|
||||
// TestFlipEnsureAndRollbackFailure verifies that when BOTH EnsureForAccount
|
||||
// AND the rollback flip fail, the switcher sets the sticky partial-swap
|
||||
// flag and returns ErrPartialSwap. The daemon is then in a documented
|
||||
// degraded state where any further swap is refused until the operator
|
||||
// restarts it.
|
||||
func TestFlipEnsureAndRollbackFailure(t *testing.T) {
|
||||
tc := newMockTmux()
|
||||
|
||||
s := state.New("")
|
||||
s.SetActiveAccount("compte1")
|
||||
|
||||
previousHome := filepath.Join(t.TempDir(), "claude-compte1")
|
||||
targetHome := filepath.Join(t.TempDir(), "claude-compte2")
|
||||
if err := os.MkdirAll(previousHome, 0700); err != nil {
|
||||
t.Fatalf("mkdir previous home: %v", err)
|
||||
}
|
||||
if err := os.MkdirAll(targetHome, 0700); err != nil {
|
||||
t.Fatalf("mkdir target home: %v", err)
|
||||
}
|
||||
// Plant the divergent link that will cause EnsureForAccount to fail.
|
||||
bogus := filepath.Join(t.TempDir(), "somewhere-else")
|
||||
if err := os.MkdirAll(bogus, 0700); err != nil {
|
||||
t.Fatalf("mkdir bogus: %v", err)
|
||||
}
|
||||
if err := os.Symlink(bogus, filepath.Join(targetHome, "session-env")); err != nil {
|
||||
t.Fatalf("plant divergent link: %v", err)
|
||||
}
|
||||
|
||||
cfg := &config.Config{
|
||||
Accounts: []config.AccountConfig{
|
||||
{Name: "compte1", Home: previousHome},
|
||||
{Name: "compte2", Home: targetHome},
|
||||
},
|
||||
Pool: config.PoolConfig{
|
||||
Autonomous: config.AutonomousConfig{Prefix: "ccl-auto-", Min: 0, Max: 0},
|
||||
},
|
||||
}
|
||||
|
||||
a := New(tc, s, cfg, make(chan quota.SwitchRequest), nil)
|
||||
|
||||
// Force the rollback flip to fail: point homeDir at a file that cannot
|
||||
// host a .claude symlink. We use a regular file; the flipSymlink
|
||||
// implementation does os.Remove() then os.Symlink() under homeDir,
|
||||
// which fails when homeDir is itself a file (ENOTDIR).
|
||||
badHomeFile := filepath.Join(t.TempDir(), "not-a-dir")
|
||||
if err := os.WriteFile(badHomeFile, []byte("block"), 0600); err != nil {
|
||||
t.Fatalf("write bad home: %v", err)
|
||||
}
|
||||
a.homeDir = badHomeFile
|
||||
a.sharedSymlinks = tmpShared(t.TempDir())
|
||||
|
||||
err := a.executeSwitchE(quota.SwitchRequest{From: "compte1"})
|
||||
if err == nil {
|
||||
t.Fatalf("expected ErrPartialSwap, got nil")
|
||||
}
|
||||
if !errors.Is(err, ErrPartialSwap) {
|
||||
t.Errorf("expected ErrPartialSwap, got %v", err)
|
||||
}
|
||||
if !a.IsPartialSwap() {
|
||||
t.Errorf("IsPartialSwap should be true when both ensure AND rollback fail")
|
||||
}
|
||||
// SetActiveAccount must still not have been called.
|
||||
if got := s.ActiveAccount(); got != "compte1" {
|
||||
t.Errorf("active account must stay compte1 in partial-swap; got %q", got)
|
||||
}
|
||||
|
||||
// A subsequent swap attempt must be refused while the flag is set.
|
||||
if err2 := a.executeSwitchE(quota.SwitchRequest{From: "compte1"}); err2 == nil {
|
||||
t.Errorf("expected subsequent swap to be refused in degraded state")
|
||||
}
|
||||
}
|
||||
|
|
|
|||
165
internal/symlinks/shared.go
Normal file
165
internal/symlinks/shared.go
Normal file
|
|
@ -0,0 +1,165 @@
|
|||
// Package symlinks manages the shared-state symlinks that every Claude
|
||||
// account home must expose, so that account failover does not create state
|
||||
// divergence (duplicated JSONL transcripts, broken undo history, drifted
|
||||
// session env).
|
||||
//
|
||||
// Rationale
|
||||
//
|
||||
// Claude Code stores three directories whose content MUST be identical
|
||||
// across the two configured accounts for failover to be a no-op:
|
||||
//
|
||||
// - projects/ — session JSONL transcripts (used by `claude --resume`)
|
||||
// - session-env/ — per-session environment and working-dir metadata
|
||||
// - file-history/ — undo/redo history persistence
|
||||
//
|
||||
// If account A writes under `~/.claude-compte1/projects/...` while account
|
||||
// B later runs under `~/.claude-compte2/projects/...`, resume fails
|
||||
// silently with "session not found" and the operator loses every in-flight
|
||||
// conversation.
|
||||
//
|
||||
// Historically we fixed this by creating symlinks manually on the
|
||||
// operator's VM. Any fresh deployment that forgets those links silently
|
||||
// reintroduces the bug. This package encodes the convention in code:
|
||||
// EnsureForAccount creates missing links, ValidateAll fails fast at
|
||||
// startup when an account home is misconfigured.
|
||||
package symlinks
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// DefaultSharedRoot is the directory under which the three shared targets
|
||||
// live. All SharedSymlink.Target values default to a subdirectory of this
|
||||
// root so tests can override the root without rewriting the shared list.
|
||||
const DefaultSharedRoot = "/home/ubuntu"
|
||||
|
||||
// SharedSymlink describes one required link inside a Claude account home.
|
||||
//
|
||||
// Target is the absolute path on disk that holds the real shared
|
||||
// directory. Name is the basename of the link that must exist inside each
|
||||
// account home (e.g. `session-env`, `file-history`, `projects`).
|
||||
type SharedSymlink struct {
|
||||
Target string
|
||||
Name string
|
||||
}
|
||||
|
||||
// RequiredShared lists the three symlinks every Claude account home must
|
||||
// expose. The list is package-level so integration tests can read it, but
|
||||
// callers SHOULD prefer the EnsureForAccount / ValidateAll entry points
|
||||
// that accept an override list for isolation.
|
||||
var RequiredShared = []SharedSymlink{
|
||||
{Target: "/home/ubuntu/.claude-session-env-shared", Name: "session-env"},
|
||||
{Target: "/home/ubuntu/.claude-file-history-shared", Name: "file-history"},
|
||||
{Target: "/home/ubuntu/.claude-projects-shared", Name: "projects"},
|
||||
}
|
||||
|
||||
// EnsureForAccount verifies (and creates if missing) every required shared
|
||||
// symlink for a single account home. Behaviour:
|
||||
//
|
||||
// - If accountHome does not exist, it is created (mode 0700).
|
||||
// - If Target (the shared destination) does not exist, it is created
|
||||
// (mode 0700). Both accounts pointing at a non-existent target would
|
||||
// produce two separate state trees on first write.
|
||||
// - If the link is absent, it is created.
|
||||
// - If the link is present and points at Target, nothing happens.
|
||||
// - If the link is present but points elsewhere, an error is returned.
|
||||
// We REFUSE to auto-correct a divergent link because fixing it blindly
|
||||
// could delete user data: the "wrong" target may contain the only copy
|
||||
// of the session transcripts.
|
||||
// - If a regular file or directory exists where the link should be,
|
||||
// an error is returned for the same reason.
|
||||
func EnsureForAccount(accountHome string, required []SharedSymlink) error {
|
||||
if accountHome == "" {
|
||||
return errors.New("symlinks: accountHome is empty")
|
||||
}
|
||||
|
||||
if err := os.MkdirAll(accountHome, 0700); err != nil {
|
||||
return fmt.Errorf("symlinks: create account home %q: %w", accountHome, err)
|
||||
}
|
||||
|
||||
for _, sl := range required {
|
||||
if err := ensureTarget(sl.Target); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := ensureLink(accountHome, sl); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ValidateAll runs EnsureForAccount on every account home. It aggregates
|
||||
// all errors and returns a single error with every failure inlined, so the
|
||||
// operator sees the full picture at startup rather than fixing one link,
|
||||
// restarting, hitting the next one, repeat.
|
||||
func ValidateAll(accountHomes []string, required []SharedSymlink) error {
|
||||
if len(accountHomes) == 0 {
|
||||
return errors.New("symlinks: no account homes provided")
|
||||
}
|
||||
var errs []string
|
||||
for _, home := range accountHomes {
|
||||
if err := EnsureForAccount(home, required); err != nil {
|
||||
errs = append(errs, err.Error())
|
||||
}
|
||||
}
|
||||
if len(errs) > 0 {
|
||||
return fmt.Errorf("symlinks: validation failed for %d account home(s): %s",
|
||||
len(errs), strings.Join(errs, "; "))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ensureTarget creates Target as an empty directory when absent.
|
||||
// An existing file (non-directory, non-symlink) at Target is an operator
|
||||
// error we cannot resolve automatically.
|
||||
func ensureTarget(target string) error {
|
||||
info, err := os.Stat(target)
|
||||
if err != nil {
|
||||
if !os.IsNotExist(err) {
|
||||
return fmt.Errorf("symlinks: stat shared target %q: %w", target, err)
|
||||
}
|
||||
if mkErr := os.MkdirAll(target, 0700); mkErr != nil {
|
||||
return fmt.Errorf("symlinks: create shared target %q: %w", target, mkErr)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
if !info.IsDir() {
|
||||
return fmt.Errorf("symlinks: shared target %q is not a directory", target)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ensureLink reconciles one link entry inside accountHome.
|
||||
func ensureLink(accountHome string, sl SharedSymlink) error {
|
||||
linkPath := filepath.Join(accountHome, sl.Name)
|
||||
|
||||
info, err := os.Lstat(linkPath)
|
||||
if err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
if linkErr := os.Symlink(sl.Target, linkPath); linkErr != nil {
|
||||
return fmt.Errorf("symlinks: create %q → %q: %w", linkPath, sl.Target, linkErr)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
return fmt.Errorf("symlinks: lstat %q: %w", linkPath, err)
|
||||
}
|
||||
|
||||
// Path exists — must be a symlink pointing at Target.
|
||||
if info.Mode()&os.ModeSymlink == 0 {
|
||||
return fmt.Errorf("symlinks: %q exists but is not a symlink (expected → %q)",
|
||||
linkPath, sl.Target)
|
||||
}
|
||||
currentTarget, err := os.Readlink(linkPath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("symlinks: readlink %q: %w", linkPath, err)
|
||||
}
|
||||
if currentTarget != sl.Target {
|
||||
return fmt.Errorf("symlinks: divergent link at %q: points to %q, expected %q (refusing to auto-correct to avoid data loss)",
|
||||
linkPath, currentTarget, sl.Target)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
276
internal/symlinks/shared_test.go
Normal file
276
internal/symlinks/shared_test.go
Normal file
|
|
@ -0,0 +1,276 @@
|
|||
package symlinks
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// testRequired returns a SharedSymlink list whose Targets live entirely
|
||||
// under tmpDir, so the tests never touch the operator's real home.
|
||||
func testRequired(tmpDir string) []SharedSymlink {
|
||||
return []SharedSymlink{
|
||||
{Target: filepath.Join(tmpDir, "session-env-shared"), Name: "session-env"},
|
||||
{Target: filepath.Join(tmpDir, "file-history-shared"), Name: "file-history"},
|
||||
{Target: filepath.Join(tmpDir, "projects-shared"), Name: "projects"},
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureForAccount_missingCreatesLinksAndTargets(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
home := filepath.Join(tmp, "account1")
|
||||
req := testRequired(tmp)
|
||||
|
||||
if err := EnsureForAccount(home, req); err != nil {
|
||||
t.Fatalf("EnsureForAccount: %v", err)
|
||||
}
|
||||
|
||||
for _, sl := range req {
|
||||
linkPath := filepath.Join(home, sl.Name)
|
||||
info, err := os.Lstat(linkPath)
|
||||
if err != nil {
|
||||
t.Errorf("expected link at %s: %v", linkPath, err)
|
||||
continue
|
||||
}
|
||||
if info.Mode()&os.ModeSymlink == 0 {
|
||||
t.Errorf("%s exists but is not a symlink", linkPath)
|
||||
}
|
||||
got, err := os.Readlink(linkPath)
|
||||
if err != nil {
|
||||
t.Errorf("readlink %s: %v", linkPath, err)
|
||||
continue
|
||||
}
|
||||
if got != sl.Target {
|
||||
t.Errorf("link %s points to %q, want %q", linkPath, got, sl.Target)
|
||||
}
|
||||
// Target directory must exist too.
|
||||
if st, err := os.Stat(sl.Target); err != nil || !st.IsDir() {
|
||||
t.Errorf("target %s should be a directory, err=%v", sl.Target, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureForAccount_idempotent(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
home := filepath.Join(tmp, "account1")
|
||||
req := testRequired(tmp)
|
||||
|
||||
if err := EnsureForAccount(home, req); err != nil {
|
||||
t.Fatalf("first pass: %v", err)
|
||||
}
|
||||
if err := EnsureForAccount(home, req); err != nil {
|
||||
t.Fatalf("second pass should be a no-op, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureForAccount_divergentLinkReturnsError(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
home := filepath.Join(tmp, "account1")
|
||||
req := testRequired(tmp)
|
||||
|
||||
// Pre-create a wrong symlink for "projects".
|
||||
if err := os.MkdirAll(home, 0700); err != nil {
|
||||
t.Fatalf("mkdir home: %v", err)
|
||||
}
|
||||
wrongTarget := filepath.Join(tmp, "someone-elses-dir")
|
||||
if err := os.MkdirAll(wrongTarget, 0700); err != nil {
|
||||
t.Fatalf("mkdir wrong target: %v", err)
|
||||
}
|
||||
linkPath := filepath.Join(home, "projects")
|
||||
if err := os.Symlink(wrongTarget, linkPath); err != nil {
|
||||
t.Fatalf("seed wrong symlink: %v", err)
|
||||
}
|
||||
|
||||
err := EnsureForAccount(home, req)
|
||||
if err == nil {
|
||||
t.Fatal("expected error for divergent link, got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "divergent") {
|
||||
t.Errorf("error should mention 'divergent': %v", err)
|
||||
}
|
||||
|
||||
// The wrong symlink MUST be preserved (no auto-correction).
|
||||
got, err := os.Readlink(linkPath)
|
||||
if err != nil {
|
||||
t.Fatalf("readlink after error: %v", err)
|
||||
}
|
||||
if got != wrongTarget {
|
||||
t.Errorf("divergent link was mutated: now %q, want preserved %q", got, wrongTarget)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureForAccount_regularFileInsteadOfLinkFails(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
home := filepath.Join(tmp, "account1")
|
||||
req := testRequired(tmp)
|
||||
|
||||
if err := os.MkdirAll(home, 0700); err != nil {
|
||||
t.Fatalf("mkdir home: %v", err)
|
||||
}
|
||||
// Create a regular file at the session-env path.
|
||||
bogus := filepath.Join(home, "session-env")
|
||||
if err := os.WriteFile(bogus, []byte("oops"), 0600); err != nil {
|
||||
t.Fatalf("seed regular file: %v", err)
|
||||
}
|
||||
|
||||
err := EnsureForAccount(home, req)
|
||||
if err == nil {
|
||||
t.Fatal("expected error for regular-file-at-link-path, got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "not a symlink") {
|
||||
t.Errorf("error should mention 'not a symlink': %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureForAccount_emptyHomeReturnsError(t *testing.T) {
|
||||
if err := EnsureForAccount("", nil); err == nil {
|
||||
t.Fatal("expected error for empty home, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateAll_multipleAccountsAllOK(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
req := testRequired(tmp)
|
||||
homes := []string{
|
||||
filepath.Join(tmp, "a"),
|
||||
filepath.Join(tmp, "b"),
|
||||
}
|
||||
if err := ValidateAll(homes, req); err != nil {
|
||||
t.Fatalf("ValidateAll: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateAll_aggregatesErrors(t *testing.T) {
|
||||
tmp := t.TempDir()
|
||||
req := testRequired(tmp)
|
||||
homes := []string{
|
||||
filepath.Join(tmp, "a"),
|
||||
filepath.Join(tmp, "b"),
|
||||
}
|
||||
|
||||
// Pre-seed account `a` with a divergent link so ValidateAll must
|
||||
// surface that error while still processing account `b`.
|
||||
if err := os.MkdirAll(homes[0], 0700); err != nil {
|
||||
t.Fatalf("mkdir a: %v", err)
|
||||
}
|
||||
wrongTarget := filepath.Join(tmp, "bad")
|
||||
if err := os.MkdirAll(wrongTarget, 0700); err != nil {
|
||||
t.Fatalf("mkdir bad: %v", err)
|
||||
}
|
||||
if err := os.Symlink(wrongTarget, filepath.Join(homes[0], "projects")); err != nil {
|
||||
t.Fatalf("seed wrong link: %v", err)
|
||||
}
|
||||
|
||||
err := ValidateAll(homes, req)
|
||||
if err == nil {
|
||||
t.Fatal("expected aggregated error, got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "divergent") {
|
||||
t.Errorf("should surface divergent: %v", err)
|
||||
}
|
||||
|
||||
// Account `b` must have been configured successfully even though `a`
|
||||
// failed. Otherwise the operator cannot see the full state at once.
|
||||
for _, sl := range req {
|
||||
if _, err := os.Lstat(filepath.Join(homes[1], sl.Name)); err != nil {
|
||||
t.Errorf("account b link %s should have been created despite a's failure: %v", sl.Name, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateAll_emptyListReturnsError(t *testing.T) {
|
||||
if err := ValidateAll(nil, nil); err == nil {
|
||||
t.Fatal("expected error for empty account list")
|
||||
}
|
||||
}
|
||||
|
||||
// TestRequiredShared_defaultsAreReasonable pins the default SharedSymlink
|
||||
// list so an accidental edit that breaks production is caught.
|
||||
func TestRequiredShared_defaultsAreReasonable(t *testing.T) {
|
||||
want := map[string]string{
|
||||
"session-env": "/home/ubuntu/.claude-session-env-shared",
|
||||
"file-history": "/home/ubuntu/.claude-file-history-shared",
|
||||
"projects": "/home/ubuntu/.claude-projects-shared",
|
||||
}
|
||||
if len(RequiredShared) != len(want) {
|
||||
t.Fatalf("RequiredShared has %d entries, want %d", len(RequiredShared), len(want))
|
||||
}
|
||||
for _, sl := range RequiredShared {
|
||||
if got, ok := want[sl.Name]; !ok {
|
||||
t.Errorf("unexpected RequiredShared entry %q", sl.Name)
|
||||
} else if got != sl.Target {
|
||||
t.Errorf("RequiredShared %q target = %q, want %q", sl.Name, sl.Target, got)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRequiredSharedIsCoherent validates the contract of the package-level
|
||||
// RequiredShared constant that the switcher and lifecycle manager consume
|
||||
// in production. The rest of the suite exercises EnsureForAccount /
|
||||
// ValidateAll with tmpdir-scoped override lists, so the actual prod
|
||||
// constant (pointing under /home/ubuntu/.claude-*-shared) is never touched
|
||||
// by those tests — a regression that shrinks or renames RequiredShared
|
||||
// would pass every other test but silently break failover on real VMs
|
||||
// (missing a link → writes to private state → transcripts duplicated).
|
||||
//
|
||||
// The test is filesystem-free: it only asserts the shape of the constant.
|
||||
//
|
||||
// 1. Exactly three entries, one per Name required by the A-failover design.
|
||||
// 2. Every Target is absolute.
|
||||
// 3. All three Targets share the same parent directory — there is no
|
||||
// mode of operation where one shared dir lives elsewhere than the
|
||||
// others. `filepath.Dir(target)` must be identical across entries.
|
||||
//
|
||||
// This encodes the "3 links under one shared root" invariant that
|
||||
// EnsureForAccount relies on. Any future change to RequiredShared that
|
||||
// breaks this invariant should force the author to update the switcher
|
||||
// contract explicitly.
|
||||
func TestRequiredSharedIsCoherent(t *testing.T) {
|
||||
expectedNames := map[string]bool{
|
||||
"session-env": false,
|
||||
"file-history": false,
|
||||
"projects": false,
|
||||
}
|
||||
if len(RequiredShared) != len(expectedNames) {
|
||||
t.Fatalf("RequiredShared must contain exactly %d entries (session-env, file-history, projects); got %d: %+v",
|
||||
len(expectedNames), len(RequiredShared), RequiredShared)
|
||||
}
|
||||
|
||||
var sharedParent string
|
||||
for i, sl := range RequiredShared {
|
||||
if _, ok := expectedNames[sl.Name]; !ok {
|
||||
t.Errorf("RequiredShared[%d]: unexpected Name %q (allowed: session-env / file-history / projects)", i, sl.Name)
|
||||
continue
|
||||
}
|
||||
if expectedNames[sl.Name] {
|
||||
t.Errorf("RequiredShared[%d]: duplicate Name %q", i, sl.Name)
|
||||
}
|
||||
expectedNames[sl.Name] = true
|
||||
|
||||
if sl.Target == "" {
|
||||
t.Errorf("RequiredShared[%d] (%q): empty Target", i, sl.Name)
|
||||
continue
|
||||
}
|
||||
if !filepath.IsAbs(sl.Target) {
|
||||
t.Errorf("RequiredShared[%d] (%q): Target %q must be absolute", i, sl.Name, sl.Target)
|
||||
continue
|
||||
}
|
||||
|
||||
parent := filepath.Dir(sl.Target)
|
||||
if i == 0 {
|
||||
sharedParent = parent
|
||||
continue
|
||||
}
|
||||
if parent != sharedParent {
|
||||
t.Errorf("RequiredShared[%d] (%q): parent dir %q diverges from %q — all shared targets must live under the same root",
|
||||
i, sl.Name, parent, sharedParent)
|
||||
}
|
||||
}
|
||||
|
||||
for name, seen := range expectedNames {
|
||||
if !seen {
|
||||
t.Errorf("RequiredShared is missing the required %q entry", name)
|
||||
}
|
||||
}
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue