Zero-Downtime Schema Changes in Go Services
Database migrations are easy in small applications because deploys are linear. Change the schema, deploy the code, done. In a real production system with multiple Go services, background workers, rolling deploys, and long-running jobs, schema changes need choreography.
The safe pattern is expand, migrate, contract. Add the new shape while the old code still works, move traffic gradually, backfill data, then remove the old shape only after every consumer has moved.
Step 1: Expand
The expand migration only adds things: a nullable column, a new table, a new index, or a trigger. It should be safe to run while old code is still deployed.
ALTER TABLE campaigns
ADD COLUMN budget_currency VARCHAR(3) NULL;
CREATE INDEX CONCURRENTLY idx_campaigns_company_currency
ON campaigns (company_id, budget_currency);
Avoid migrations that rewrite huge tables during business hours. Even if the database supports online operations, test the migration with realistic data volume before trusting it.
Step 2: Dual Write
After the expand migration, the application writes both old and new fields. Reads still come from the old field. This gives you time to verify that the new data is correct without changing user-visible behavior.
func (s *CampaignService) UpdateBudget(ctx context.Context, input UpdateBudgetInput) error {
campaign.Budget = input.Amount
campaign.BudgetCurrency = input.Currency
return s.repo.Save(ctx, campaign)
}
Step 3: Backfill
Backfills should be idempotent and chunked. A backfill that tries to update millions of rows in one transaction is an outage waiting for a calendar invite.
- Process rows in small batches.
- Record progress so the job can resume.
- Throttle when database latency increases.
- Emit metrics for rows scanned, rows updated, and errors.
Step 4: Switch Reads
Once dual writes and backfill are stable, switch reads to the new field behind a feature flag or a small deploy. This is the step where application tests matter most because the schema now has two possible truths.
Step 5: Contract
Only after all services, workers, scripts, and dashboards use the new schema should you remove the old field. The contract migration is the dangerous one. Give it time. Search the codebase. Check query logs. Make sure no old binary is still running.
Zero-downtime migrations are mostly discipline. Each individual step is simple. The safety comes from refusing to combine all steps into one heroic deploy.
Comments
Post a Comment