claude-failover/internal/router/router.go

96 lines
3.4 KiB
Go
Raw Normal View History

feat(phase2-E): multi-provider routing via secutools delegation Adds optional delegation of agent-queue tasks to the SecuAAS secutools AI platform (GPU / Gemini / Claude API) instead of dispatching to a local Claude Code tmux session. Per-task opt-in via YAML frontmatter fields preferred_ai, allow_delegation, complexity_hint — absence keeps the Phase 1 behaviour exactly (zero breaking change). Go side: - internal/secutools: HTTP client with exponential-backoff retries (SubmitJob/GetJob/WaitForResult), DecideProvider map adapter for CLI use, table tests. - internal/router: struct-typed Decide() with strict precedence (needs_claude_code > preferred_ai=claude-code > allow_delegation=false > preferred_ai > fail-safe local on unknown). - internal/delegation: Manager submits jobs, writes .md.delegated markers for on-restart recovery, runs a periodic reaper that moves completed jobs into done/ with provider/cost footer and failed jobs into failed/. - internal/dispatcher: WithDelegation() opt-in, routeTask hook before findFreeSession, skips .md.delegated in assignNextTask. - internal/api: /api/delegated/status (active jobs + counters), /watchdog/status extended with delegation counters. - cmd/ccl-delegate: small CLI exposing submit/get/result/decide so the bash dispatcher can call the same contract without duplicating logic. - cmd/claude-failover: delegation wired opt-in via SECUTOOLS_API_KEY. Tests: - 29+ new unit tests across router, secutools, delegation, dispatcher, api packages. go test -race -count=1 clean. - tests/phase2-E-integration.sh: bash end-to-end against a Python stdlib mock HTTP server, exercising the dev-management scripts. Forward-compat with watchdog (Phase 1 B1 already ignores state=delegated_to_secutools) so delegated tasks aren't flagged stale.
2026-04-17 02:17:19 +00:00
// Package router decides whether a task should run on a local Claude Code
// session (current Phase 1 behaviour) or be delegated to the centralized
// SecuAAS secutools AI platform (Phase 2 chantier E).
//
// The decision is driven entirely by the task's YAML frontmatter; no
// network call is performed in Decide.
package router
import "strings"
// Provider is the destination chosen for a task.
type Provider string
const (
// ProviderClaudeCode means dispatch to a local ccl-auto tmux session
// running Claude Code. This is the Phase 1 behaviour.
ProviderClaudeCode Provider = "claude-code"
// ProviderAuto means delegate to secutools and let its smart_triage
// router pick the actual backend (GPU > Claude > Gemini fallback chain).
ProviderAuto Provider = "auto"
// ProviderGPU pins delegation to the in-cluster vLLM GPU pool.
ProviderGPU Provider = "gpu"
// ProviderGemini pins delegation to Google Gemini via secutools.
ProviderGemini Provider = "gemini"
// ProviderClaudeAPI pins delegation to the Anthropic API (NOT Claude
// Code locally — this means stateless API calls billed to the secutools
// account).
ProviderClaudeAPI Provider = "claude-api"
)
// IsDelegated reports whether p means "submit to secutools" (i.e. not the
// local Claude Code path).
func (p Provider) IsDelegated() bool {
switch p {
case ProviderAuto, ProviderGPU, ProviderGemini, ProviderClaudeAPI:
return true
default:
return false
}
}
// Decision is the output of Decide: which provider to use, plus a short
// human-readable reason for logging and the /api/delegated/status endpoint.
type Decision struct {
Provider Provider
Reason string
}
// Task is the slice of frontmatter fields the router cares about. It is
// intentionally narrower than the dispatcher's full TaskFrontmatter so that
// router tests don't need to import the dispatcher.
type Task struct {
PreferredAI string // auto | claude-code | gpu | gemini | claude-api (case-insensitive)
AllowDelegation bool // default false → backward-compatible (Claude Code)
NeedsClaudeCode bool // legacy bypass — forces ProviderClaudeCode
ComplexityHint string // low | medium | high (informational only for now)
}
// Decide returns the routing decision for the given task.
//
// Order of precedence:
// 1. needs_claude_code: true → ProviderClaudeCode (legacy bypass, never delegate)
// 2. preferred_ai: claude-code → ProviderClaudeCode (explicit)
// 3. allow_delegation: false → ProviderClaudeCode (default safety net)
// 4. preferred_ai parses to a delegated provider → that provider
// 5. fallback → ProviderAuto (let secutools smart_triage decide)
func Decide(t Task) Decision {
if t.NeedsClaudeCode {
return Decision{ProviderClaudeCode, "needs_claude_code=true"}
}
pref := strings.ToLower(strings.TrimSpace(t.PreferredAI))
if pref == string(ProviderClaudeCode) {
return Decision{ProviderClaudeCode, "preferred_ai=claude-code"}
}
if !t.AllowDelegation {
return Decision{ProviderClaudeCode, "allow_delegation=false (default)"}
}
switch Provider(pref) {
case ProviderGPU, ProviderGemini, ProviderClaudeAPI:
return Decision{Provider(pref), "preferred_ai=" + pref}
case ProviderAuto, "":
return Decision{ProviderAuto, "preferred_ai=auto (smart_triage)"}
default:
// Unknown value: fail safe to local Claude Code so a typo doesn't
// silently route real work to GPU.
return Decision{ProviderClaudeCode, "unknown preferred_ai=" + pref + " → fail-safe"}
}
}