Rollbacks¶
Before You Read¶
Rollbacks should be fast. If you're in the middle of an incident, go straight to the steps. For production outages also open the Production Outage Runbook.
When to Roll Back¶
Roll back immediately if: - A deployment causes error rate to spike above baseline - A deployment causes latency P99 to degrade significantly - Health checks start failing after a deployment - The application is returning 5xx errors for a newly deployed service
Do not roll back if: - The issue existed before the deployment (verify with Grafana) - The deployment is not yet live (sync is in progress — wait or abort sync first)
Rollback Options¶
There are three approaches, in order of preference:
| Option | Speed | Complexity | When to Use |
|---|---|---|---|
| ArgoCD Rollback | Fast | Low | Deployed version is in ArgoCD history |
| Git Revert | Medium | Low | Can open and merge a PR quickly |
| Helm Rollback | Fast | Medium | ArgoCD rollback isn't available |
Option 1: ArgoCD Rollback¶
The cleanest option. ArgoCD keeps a history of recent syncs.
Via ArgoCD UI¶
- Open https://argocd.{env}.orofi.xyz
- Find the application (e.g.,
microservice-identity) - Click History tab
- Find the last known-good deployment
- Click Rollback → confirm
Via ArgoCD CLI¶
# List recent sync history
argocd app history microservice-identity
# Rollback to a specific revision (get revision number from history)
argocd app rollback microservice-identity {revision-number}
# Example — rollback to revision 42
argocd app rollback microservice-identity 42
Auto-sync conflict
If auto-sync is enabled, ArgoCD will immediately re-sync after a rollback. To prevent this, disable auto-sync first:
Option 2: Git Revert¶
This is the safest long-term approach — it creates an audit trail and ensures the rolled-back state is committed to Git.
# In infrastructure-configuration repo
git log --oneline -- projects/orofi/{env}/{service}/helm/values.yaml
# Identify the last good commit (e.g., abc123)
git revert HEAD --no-commit # or revert a specific commit: git revert abc123
# Edit the revert commit message to be descriptive
git commit -m "revert: rollback microservice-identity to v1.2.2 due to elevated error rate"
# Open a PR — get it merged quickly
After merge, ArgoCD auto-syncs with the reverted values.
Option 3: Helm Rollback¶
Use this if ArgoCD is unavailable or unhealthy.
# List Helm release history
helm history microservice-identity -n microservice-identity
# Rollback to the previous release
helm rollback microservice-identity -n microservice-identity
# Or rollback to a specific revision number
helm rollback microservice-identity {revision} -n microservice-identity
ArgoCD drift
After a manual Helm rollback, ArgoCD will detect drift between Git and the cluster and may re-apply the bad version on next sync. Disable auto-sync as noted above and then do a Git revert.
ArgoCD Rollout Rollback (Canary Services)¶
For services using ArgoCD Rollouts (canary deployment):
# Abort the current rollout (reverts to stable version immediately)
kubectl argo rollouts abort microservice-identity -n microservice-identity
# Check rollout status
kubectl argo rollouts get rollout microservice-identity -n microservice-identity
Verifying the Rollback¶
After any rollback:
# Verify the image tag is the old version
kubectl get deployment microservice-identity -n microservice-identity \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# Watch pods stabilize
kubectl rollout status deployment/microservice-identity -n microservice-identity
# Check error rate in Grafana
# https://grafana.{env}.orofi.xyz
Confirm in Grafana that error rates have returned to baseline before declaring success.
Post-Rollback Actions¶
- Write up what happened in [NEEDS TEAM INPUT: incident tracking system]
- Identify root cause before re-deploying the problematic version
- If the bad version was in staging or production, notify affected stakeholders
- Re-enable ArgoCD auto-sync once the situation is stable
See Also¶
- GitOps Workflow — normal deployment flow
- Production Outage Runbook — full incident response
- Reading Logs — diagnose what went wrong