Database Migration Strategies for Zero-Downtime Deployments
Database migrations don't have to mean maintenance windows. Learn expand-contract, backfill, and dual-write patterns that keep your systems running while schema changes roll out.
Executive summary
Database migrations don't have to mean maintenance windows. Learn expand-contract, backfill, and dual-write patterns that keep your systems running while schema changes roll out.
Last updated: 3/11/2026
The migration problem
In traditional deployment models, database migrations trigger maintenance windows. The application stops, schema changes apply, data migrates, and the application restarts. This approach no longer scales in environments requiring 24/7 availability.
The challenge emerges from the fundamental tension between schema changes and running code. New code expects new schema; old code runs on old schema. During migration, both versions coexist, creating a window of incompatibility.
Zero-downtime migrations solve this through careful orchestration: phase deployments, backward-compatible schema changes, and data migration strategies that keep systems operational throughout the process.
The expand-contract pattern
The expand-contract pattern structures schema changes into additive and subtractive phases, ensuring compatibility between old and new code versions.
Phase 1: Expand (additive changes)
Add new columns, tables, or indexes without removing existing structures. New code can use new features while old code continues using existing structures.
sql-- Step 1: Add new column with default value
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
-- Step 2: Create new table for new relationship
CREATE TABLE user_preferences (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
preferences JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Step 3: Create index for new query pattern
CREATE INDEX idx_users_email_verified ON users(email_verified) WHERE email_verified = TRUE;Phase 2: Deploy application code
Deploy new application version that works with both old and new schema. Code conditionally uses new features based on data availability.
typescript// Application code handling migration state
class UserService {
async getUserPreferences(userId: number): Promise<Preferences> {
// Try new table first
const newPrefs = await this.db.userPreferences.findUnique({
where: { userId }
});
if (newPrefs) {
return newPrefs.preferences;
}
// Fall back to old structure
const user = await this.db.users.findUnique({
where: { id: userId }
});
return user.legacyPreferences || {};
}
async setEmailVerified(userId: number, verified: boolean): Promise<void> {
// Write to both old and new structures
await this.db.users.update({
where: { id: userId },
data: { emailVerified: verified }
});
await this.db.userPreferences.upsert({
where: { userId },
update: { preferences: { verified } },
create: { userId, preferences: { verified } }
});
}
}Phase 3: Backfill data
Migrate existing data to new structure while system remains operational. Use batch processing with error handling and monitoring.
typescriptclass DataBackfillService {
async backfillUserPreferences(batchSize: number = 1000): Promise<void> {
let offset = 0;
let hasMore = true;
while (hasMore) {
const users = await this.db.users.findMany({
take: batchSize,
skip: offset,
where: { emailVerified: null }
});
if (users.length === 0) {
hasMore = false;
continue;
}
for (const user of users) {
try {
await this.migrateUserPreferences(user);
this.metrics.record('backfill_success', { userId: user.id });
} catch (error) {
this.metrics.record('backfill_error', {
userId: user.id,
error: error.message
});
this.logger.error(`Failed to backfill user ${user.id}`, { error });
}
}
offset += batchSize;
// Prevent overwhelming database
await sleep(100);
}
}
private async migrateUserPreferences(user: any): Promise<void> {
const legacyPrefs = await this.db.legacyPreferences.findUnique({
where: { userId: user.id }
});
await this.db.userPreferences.create({
data: {
userId: user.id,
preferences: this.transformPreferences(legacyPrefs)
}
});
}
}Phase 4: Contract (subtractive changes)
Remove old columns, tables, and indexes after verifying new code works correctly with new structure.
sql-- Step 1: Remove old column
ALTER TABLE users DROP COLUMN legacy_preferences;
-- Step 2: Drop old table (after verification)
DROP TABLE legacy_user_preferences;
-- Step 3: Remove old indexes
DROP INDEX IF EXISTS idx_users_legacy_lookup;Dual-write strategy
For complex migrations requiring structural changes, the dual-write pattern writes to both old and new structures simultaneously.
Implementation pattern
typescriptclass DualWriteService {
private readonly migrationWindow: Duration = { days: 7 };
private readonly verifyWindow: Duration = { days: 3 };
async createUser(data: UserData): Promise<User> {
// Start transaction for consistency
const result = await this.db.$transaction(async (tx) => {
// Write to old structure
const oldUser = await tx.users.create({
data: this.mapToOldSchema(data)
});
// Write to new structure
const newUser = await tx.newUsers.create({
data: this.mapToNewSchema(data)
});
// Record mapping for verification
await tx.userMigrationMapping.create({
data: {
oldId: oldUser.id,
newId: newUser.id,
migratedAt: new Date()
}
});
return { oldUser, newUser };
});
return result.oldUser; // Return old structure during migration
}
async updateUser(userId: number, data: Partial<UserData>): Promise<User> {
// Find both IDs from mapping
const mapping = await this.db.userMigrationMapping.findUnique({
where: { oldId: userId }
});
if (!mapping) {
throw new Error('User migration mapping not found');
}
// Update both structures
await this.db.$transaction(async (tx) => {
await tx.users.update({
where: { id: mapping.oldId },
data: this.mapToOldSchema(data)
});
await tx.newUsers.update({
where: { id: mapping.newId },
data: this.mapToNewSchema(data)
});
});
return this.db.users.findUnique({ where: { id: userId } });
}
}Verification and cutover
typescriptclass MigrationVerificationService {
async verifyMigrationIntegrity(): Promise<VerificationReport> {
const totalRecords = await this.db.userMigrationMapping.count();
const errors: MigrationError[] = [];
// Sample check: verify 10% of records
const sampleSize = Math.ceil(totalRecords * 0.1);
const samples = await this.db.userMigrationMapping.findMany({
take: sampleSize,
orderBy: { createdAt: 'desc' }
});
for (const sample of samples) {
const oldRecord = await this.db.users.findUnique({
where: { id: sample.oldId }
});
const newRecord = await this.db.newUsers.findUnique({
where: { id: sample.newId }
});
const diff = this.compareRecords(oldRecord, newRecord);
if (!diff.match) {
errors.push({
userId: sample.oldId,
differences: diff.differences,
severity: this.assessSeverity(diff)
});
}
}
return {
totalRecords,
sampleSize,
errorCount: errors.length,
errorRate: errors.length / sampleSize,
errors: errors.filter(e => e.severity === 'critical'),
warnings: errors.filter(e => e.severity !== 'critical')
};
}
private compareRecords(old: any, new: any): { match: boolean; differences: string[] } {
const differences: string[] = [];
// Compare all mapped fields
const fieldMap = {
'email': 'email',
'name': 'fullName',
'createdAt': 'createdTimestamp'
};
for (const [oldField, newField] of Object.entries(fieldMap)) {
if (old[oldField] !== new[newField]) {
differences.push(`${oldField} (${old[oldField]}) !== ${newField} (${new[newField]})`);
}
}
return {
match: differences.length === 0,
differences
};
}
}Backfill with rate limiting
Large data backfills can overwhelm databases. Rate limiting and batching ensure migration doesn't impact production performance.
typescriptclass RateLimitedBackfillService {
private readonly rateLimiter: TokenBucket;
constructor(private config: BackfillConfig) {
// Token bucket: allows bursts up to burstCapacity, then sustained rate
this.rateLimiter = new TokenBucket({
capacity: config.burstCapacity || 100,
refillRate: config.requestsPerSecond || 10
});
}
async backfillLargeDataset(query: any, transformer: (row: any) => any): Promise<void> {
let cursor: string | null = null;
let totalProcessed = 0;
while (cursor !== 'DONE') {
// Wait for rate limiter
await this.rateLimiter.waitForToken();
// Fetch batch
const batch = await this.db.query(query, {
cursor,
limit: this.config.batchSize || 100
});
if (batch.length === 0) {
cursor = 'DONE';
continue;
}
// Process batch
const results = await Promise.allSettled(
batch.map(row => this.processRow(row, transformer))
);
// Track results
const succeeded = results.filter(r => r.status === 'fulfilled').length;
const failed = results.filter(r => r.status === 'rejected').length;
totalProcessed += succeeded;
this.metrics.record('backfill_batch', {
processed: succeeded,
failed,
total: batch.length
});
// Check health before continuing
if (await this.shouldPauseForHealth()) {
await this.waitUntilHealthy();
}
cursor = batch[batch.length - 1].cursor;
}
}
private async shouldPauseForHealth(): Promise<boolean> {
const metrics = await this.getDatabaseMetrics();
return (
metrics.cpu > this.config.maxCpu ||
metrics.memory > this.config.maxMemory ||
metrics.connections > this.config.maxConnections
);
}
}Transactional integrity
Migrations must maintain data consistency even when failures occur.
typescriptclass TransactionalMigrationService {
async migrateWithRollback(
migrationId: string,
operations: MigrationOperation[]
): Promise<MigrationResult> {
const migrationRecord = await this.db.migrationRecord.create({
data: {
id: migrationId,
status: 'IN_PROGRESS',
startedAt: new Date()
}
});
try {
for (const operation of operations) {
await this.executeOperation(operation);
await this.recordOperationProgress(migrationId, operation.id, 'COMPLETED');
}
await this.db.migrationRecord.update({
where: { id: migrationId },
data: {
status: 'COMPLETED',
completedAt: new Date()
}
});
return { success: true, migrationId };
} catch (error) {
// Rollback completed operations
await this.rollbackMigration(migrationId, operations);
await this.db.migrationRecord.update({
where: { id: migrationId },
data: {
status: 'FAILED',
failedAt: new Date(),
error: error.message
}
});
throw new MigrationFailedError(`Migration ${migrationId} failed`, { cause: error });
}
}
private async rollbackMigration(
migrationId: string,
operations: MigrationOperation[]
): Promise<void> {
// Rollback in reverse order
const completedOps = operations.filter(o => o.status === 'COMPLETED').reverse();
for (const operation of completedOps) {
try {
await this.executeRollback(operation);
await this.recordRollbackProgress(migrationId, operation.id);
} catch (rollbackError) {
this.logger.error(`Failed to rollback ${operation.id}`, { error: rollbackError });
}
}
}
}Common migration pitfalls
Pitfall 1: Assuming immediate cutover
Problem: Assuming all data backfills instantly and cutover can happen immediately.
Consequences: Cutover fails because backfill hasn't completed or data is inconsistent.
Solution: Monitor backfill progress, verify data integrity, and allow verification window before cutover.
Pitfall 2: Forgetting about indexes
Problem: Adding columns without indexes results in poor query performance.
Consequences: Degraded performance after migration, timeouts, cascading failures.
Solution: Add indexes as part of expand phase, monitor query performance.
Pitfall 3: Changing data types in place
Problem: ALTER TABLE that changes data type locks the entire table.
Consequences: Table lock blocks all writes, causing application failures.
Solution: Use expand-contract: add new column, backfill data, migrate reads/writes, then drop old column.
Pitfall 4: Ignoring foreign key constraints
Problem: Adding foreign keys requires lock on referenced table.
Consequences: Lock blocks writes, potential downtime.
Solution: Add foreign keys without NOT NULL constraint first, backfill data, then add constraint.
Migration checklist
Before executing zero-downtime migration:
Planning phase
- [ ] Document current schema and target schema
- [ ] Identify all application versions accessing the table
- [ ] Map out expand-contract steps
- [ ] Design backfill strategy with rate limiting
- [ ] Plan rollback procedure
- [ ] Set up monitoring and alerting
Expand phase
- [ ] Deploy schema changes (additive only)
- [ ] Verify new structure exists
- [ ] Deploy application code compatible with both schemas
- [ ] Monitor for errors and performance impact
Backfill phase
- [ ] Execute data migration with rate limiting
- [ ] Monitor database health metrics
- [ ] Verify data integrity samples
- [ ] Track progress and ETA
Verification phase
- [ ] Compare old and new data sets
- [ ] Verify application functionality with new schema
- [ ] Load test with new structure
- [ ] Review error logs for migration issues
Contract phase
- [ ] Deploy application using only new schema
- [ ] Monitor for issues with old dependencies
- [ ] Remove old schema (drop columns, tables, indexes)
- [ ] Clean up migration data and metadata
Tooling recommendations
Database migration tools
| Tool | Best for | Trade-offs |
|---|---|---|
| Liquibase | Cross-database compatibility | Verbose XML configuration |
| Flyway | Simple, version-based migrations | Limited rollback support |
| Alembic | Python ecosystems | SQLAlchemy dependency |
| dbmate | Simple, database-agnostic | Fewer advanced features |
| Prisma Migrate | Type-safe migrations | ORM coupling |
Verification tools
bash# Use sqldiff for schema comparison
sqldiff old_schema.sql new_schema.sql
# Use pg_restore --list to verify backups
pg_restore --list backup.dump
# Use pgbench for load testing
pgbench -h localhost -p 5432 -U postgres -d production -c 10 -j 2 -T 300Conclusion
Zero-downtime database migrations require careful planning, phased execution, and continuous verification. The expand-contract pattern provides a structured approach: add changes, deploy compatible code, migrate data, verify, then remove old structures.
The key insight is treating migrations as a process, not an event. Database changes span deployment phases, not a single maintenance window. By designing for compatibility during migration, you maintain availability while evolving your data model.
Start with smaller migrations on non-critical tables. Build muscle memory with expand-contract, backfill, and verification patterns. As your team gains confidence, apply these techniques to larger, more complex schema changes.
Your team needs help planning and executing zero-downtime database migrations? Talk to Imperialis engineering specialists about migration strategy, implementation support, and production-grade migration tooling for your critical systems.
Sources
- Zero Downtime Migrations at GitHub — Migration patterns from GitHub
- Stripe's Zero Downtime Migrations — Migration practices from Stripe
- The Expand-Contract Pattern — Pattern explanation
- Database Migration Best Practices — Microsoft migration guidance