Post-Deploy Health Monitor
How CodeSpar monitors deployments and auto-rolls back — 30-second checks, Sentry integration, rollback decision engine, and autonomy-level behavior.
Post-Deploy Health Monitor
After every deployment, CodeSpar runs a health monitor that checks your application for 5 minutes. If something goes wrong, the system decides whether to notify, suggest a rollback, or auto-rollback — depending on your autonomy level.
How It Works
- Deploy completes — the Deploy Agent triggers the health monitor
- 30-second check interval — the monitor runs every 30 seconds for 5 minutes (10 checks total)
- Two data sources — audit trail error rate + Sentry API
- Decision engine — evaluates 6 rules after each check
- Action — notify, suggest rollback, or auto-rollback based on autonomy level
What It Checks
Audit Trail Error Rate
The monitor queries the audit log for error events in the post-deploy window and compares against the pre-deploy baseline:
| Metric | How It's Calculated |
|---|---|
| Pre-deploy baseline | Average error rate over the 30 minutes before deploy |
| Post-deploy rate | Error rate in each 30-second check window |
| Spike threshold | Post-deploy rate > 2x baseline (configurable) |
Sentry API
If Sentry is configured, the monitor also fetches recent issues:
| Metric | What It Checks |
|---|---|
| New issues | Issues first seen after the deploy timestamp |
| Regression issues | Previously resolved issues that reappeared |
| Event volume | Total error events in the post-deploy window |
Configure Sentry integration:
Rollback Decision Engine
The decision engine evaluates 6 rules after each health check. Rules are evaluated in order — the first match determines the action.
Rule 1: Critical Error Spike
Condition: Post-deploy error rate > 5x baseline Action: Immediate rollback (at L4+) or urgent notification (at L1-L3)
This catches catastrophic failures like a broken database connection or a missing environment variable that causes every request to fail.
Rule 2: New Sentry Errors (High Severity)
Condition: New Sentry issues with fatal or error level first seen after deploy
Action: Rollback recommended (at L3+) or notification (at L1-L2)
New errors that did not exist before the deploy are a strong signal of a regression.
Rule 3: Sentry Regression
Condition: Previously resolved Sentry issues reappeared after deploy Action: Rollback recommended (at L3+) or notification (at L1-L2)
A regression means something that was fixed has broken again — often a reverted fix or a conflicting change.
Rule 4: Sustained Error Rate Increase
Condition: Post-deploy error rate > 2x baseline for 3+ consecutive checks (90 seconds) Action: Rollback suggested (at L3+) or notification (at L1-L2)
A sustained increase that is not a brief transient spike indicates a real problem.
Rule 5: Transient Spike (Single Check)
Condition: Post-deploy error rate > 2x baseline for only 1 check, then returns to normal Action: Log and continue monitoring
Brief spikes are common during deploy transitions (e.g., cold starts, cache warming). The engine waits for the next check before acting.
Rule 6: Baseline Comparison (Healthy)
Condition: Post-deploy error rate within 1.5x baseline, no new Sentry issues Action: Continue monitoring. After 10 healthy checks, mark deploy as confirmed.
Rule Summary
| Rule | Condition | L1-L2 | L3 | L4+ |
|---|---|---|---|---|
| Critical spike (5x) | Error rate > 5x baseline | Notify (urgent) | Suggest rollback | Auto-rollback |
| New Sentry errors | Fatal/error after deploy | Notify | Suggest rollback | Auto-rollback |
| Sentry regression | Resolved issue reappeared | Notify | Suggest rollback | Auto-rollback |
| Sustained increase (2x) | > 2x baseline for 90s | Notify | Suggest rollback | Suggest rollback |
| Transient spike | > 2x baseline, 1 check only | Log | Log | Log |
| Healthy | Within 1.5x baseline | Continue | Continue | Continue |
Autonomy Levels and Rollback Behavior
The health monitor's actions depend on your project's autonomy level:
L1-L2: Notify Only
The monitor sends alerts to connected channels but never takes action automatically.
L3: Suggest Rollback
The monitor sends an alert with a one-click rollback suggestion. The team must approve.
L4+: Auto-Rollback
For critical spikes and new Sentry errors, the monitor automatically triggers a rollback and notifies the team after.
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
HEALTH_MONITOR_ENABLED | true | Enable post-deploy health monitoring |
HEALTH_MONITOR_INTERVAL_MS | 30000 | Check interval in milliseconds |
HEALTH_MONITOR_DURATION_MS | 300000 | Total monitoring window (5 min) |
HEALTH_MONITOR_SPIKE_THRESHOLD | 2.0 | Error rate multiplier to trigger sustained alert |
HEALTH_MONITOR_CRITICAL_THRESHOLD | 5.0 | Error rate multiplier for immediate action |
HEALTH_MONITOR_SUSTAINED_CHECKS | 3 | Consecutive checks above threshold before sustained alert |
HEALTH_MONITOR_BASELINE_WINDOW_MS | 1800000 | Pre-deploy baseline window (30 min) |
SENTRY_API_TOKEN | "" | Sentry API token for issue fetching |
SENTRY_ORG_SLUG | "" | Sentry organization slug |
SENTRY_PROJECT_SLUG | "" | Sentry project slug |
Per-Project Override
You can override thresholds per project in the project configuration:
Channel Notifications
The health monitor sends notifications to all channels connected to the project. Notification format adapts per channel:
What Teams Receive
| Event | Notification |
|---|---|
| Monitor started | Brief message: "Health monitor active for v2.4.1 — watching for 5 min" |
| Transient spike | No notification (logged internally) |
| Sustained increase | Alert with error rate, Sentry issues, suggested action |
| Critical spike | Urgent alert with rollback suggestion or auto-rollback confirmation |
| Deploy confirmed healthy | Success message: "v2.4.1 confirmed healthy after 5 min (10/10 checks passed)" |
| Auto-rollback executed | Rollback confirmation with reason, previous version, and health status |
Example: Deploy Confirmed Healthy
Troubleshooting
Health Monitor Not Running
- Verify
HEALTH_MONITOR_ENABLEDistrue - Check that the deploy was triggered through CodeSpar (manual deploys do not trigger the monitor unless configured)
- Check server logs for
health-monitorentries
False Positives (Rollback on Healthy Deploy)
- Lower the
HEALTH_MONITOR_SPIKE_THRESHOLD(e.g., from2.0to3.0) - Increase
HEALTH_MONITOR_SUSTAINED_CHECKSto require more consecutive failures - Check if your baseline window includes anomalous data (e.g., a previous incident)
Sentry Issues Not Detected
- Verify
SENTRY_API_TOKENhasproject:readandevent:readscopes - Confirm
SENTRY_ORG_SLUGandSENTRY_PROJECT_SLUGmatch your Sentry project - Check that Sentry is receiving events from your application (Sentry dashboard > Issues)
Next Steps
- Deploy Pipeline -- set up the full deploy workflow
- PagerDuty Integration -- page on-call when deploys fail
- Webhook Monitoring -- monitor CI builds via webhooks
- Approval System -- configure rollback approval rules