CI/CD for Prompt and Model Releases
Shadow → canary → full, with statistical gates the whole way.
Problem
Prompts, model versions, retrieval indices, and tool schemas are all release artifacts. We need a release pipeline that gates on offline evals, then validates online via shadow and canary, with automatic rollback.
Goals
- ✓Block merges that regress safety or headline metrics with statistical significance.
- ✓Run shadow comparisons before any user traffic shifts.
- ✓Canary at 1% → 5% → 25% → 100% with quality and error gates at each step.
- ✓Automatic rollback and full audit trail for every release.
Non-goals
- ·Index ingestion pipeline.
Architecture
Walkthrough
1. Offline gates
PR runs the full eval set with statistical comparison vs. production baseline. Gates: zero safety regression, no slice down >2σ, headline not down at p<0.05. PR is blocked otherwise; gates can be force-overridden with a recorded justification (counted in monthly release-quality KPIs).
2. Shadow validation
After offline pass, the artifact deploys in shadow: a copy of production traffic is sent to both old and new versions, both responses are logged and judged, but only the old response is returned to users. Shadow runs for at least 1 hour or 10k requests. Online judge mean must not regress at p<0.01 to proceed.
3. Canary
Canary controller routes 1% → 5% → 25% → 100% traffic, with each step held for ≥30 minutes. At every step we check: error rate, latency p95, online quality score, refusal rate, and a curated 'safety canary' set sampled live. Any threshold breach pins traffic and pages.
4. Audit and registry
Every artifact (prompt SHA, model version, tool schema hash) lands in the registry with offline eval report, shadow scores, canary decisions, and final state. Compliance and post-mortems work directly off this registry; nothing is reconstructed from logs.
Tradeoffs
Metrics to track
- PR merge → 100% rollout p50/p95
- Override rate per team per month
- Auto-rollback fire rate (and time-to-mitigate)
- Shadow-detected regressions per quarter