Cloud and platform

Database Sharding Strategies: Horizontal Scaling for Production Systems

When a single database instance can no longer handle the load, sharding becomes the path forward. Understanding sharding patterns, trade-offs, and operational implications prevents architectural dead ends.

3/13/20269 min readCloud
Database Sharding Strategies: Horizontal Scaling for Production Systems

Executive summary

When a single database instance can no longer handle the load, sharding becomes the path forward. Understanding sharding patterns, trade-offs, and operational implications prevents architectural dead ends.

Last updated: 3/13/2026

Introduction: When vertical scaling hits the wall

Every growing application eventually encounters the limits of a single database instance. Vertical scaling—adding CPU, memory, or storage to an existing instance—becomes increasingly expensive and eventually impossible. At that point, horizontal scaling through database sharding becomes necessary.

Sharding splits a large database into smaller, more manageable pieces called shards, distributed across multiple database instances. Each shard holds a subset of the data, and the application routes queries to the appropriate shard.

The decision to shard is significant. It introduces architectural complexity, operational overhead, and new failure modes. But for data-intensive applications with high write throughput or large datasets, sharding is often the only path to sustainable scale.

Understanding sharding fundamentals

Sharding vs replication

Sharding is fundamentally different from replication:

AspectReplicationSharding
PurposeHigh availability, read scalabilityWrite scalability, data distribution
DataFull copy on each instancePartitioned subset on each instance
Write capacityLimited by primaryScales with shard count
Read capacityScales with replica countScales with shard count
ComplexityLowHigh
ConsistencyStrong (with appropriate configuration)Eventual (across shards)

Replication copies data for high availability and read scaling. Sharding distributes data for write scaling and data distribution.

When to consider sharding

Indicators that sharding may be necessary:

  • Write throughput exceeds single instance capacity: Database is I/O bound or CPU bound during write-heavy workloads
  • Dataset exceeds storage limits: Single instance cannot store all data (common with SaaS applications storing user-generated content)
  • Performance degrades with data growth: Query latency increases as data size grows
  • Backup and restore become impractical: Full database backups take too long
  • Vertical scaling cost becomes prohibitive: Upgrading to larger instances is no longer cost-effective

Scale before you need it doesn't apply to sharding. Shard when you have clear evidence that single-instance scaling is insufficient, not as a premature optimization.

Sharding architectures

Application-managed sharding

The application layer determines which shard should receive each query. This is the most common approach and offers maximum flexibility.

typescript// Application-level sharding
class ShardedDatabase {
  private shards: Map<number, DatabaseConnection> = new Map();
  private shardCount: number;

  constructor(shardConfigs: ShardConfig[]) {
    this.shardCount = shardConfigs.length;
    shardConfigs.forEach(config => {
      this.shards.set(config.shardId, this.createConnection(config));
    });
  }

  // Hash-based sharding
  getShardId(key: string): number {
    const hash = this.hashFunction(key);
    return hash % this.shardCount;
  }

  async getUser(userId: string): Promise<User> {
    const shardId = this.getShardId(userId);
    const shard = this.shards.get(shardId);
    return shard.query('SELECT * FROM users WHERE id = ?', [userId]);
  }

  async createUser(user: User): Promise<User> {
    const shardId = this.getShardId(user.id);
    const shard = this.shards.get(shardId);
    return shard.insert('users', user);
  }

  // Range-based sharding
  async getOrdersByDateRange(startDate: Date, endDate: Date): Promise<Order[]> {
    const results: Order[] = [];

    // Query only relevant shards
    const relevantShards = this.getShardsForDateRange(startDate, endDate);

    for (const shardId of relevantShards) {
      const shard = this.shards.get(shardId);
      const shardResults = await shard.query(
        'SELECT * FROM orders WHERE created_at BETWEEN ? AND ?',
        [startDate, endDate]
      );
      results.push(...shardResults);
    }

    return results;
  }

  private hashFunction(key: string): number {
    // Simple hash function - in production use crypto or murmurhash
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      const char = key.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }

  private getShardsForDateRange(startDate: Date, endDate: Date): number[] {
    // Implementation depends on date range sharding strategy
    // Example: Shard by month
    const startMonth = startDate.getMonth();
    const endMonth = endDate.getMonth();
    const shards = [];

    for (let month = startMonth; month <= endMonth; month++) {
      shards.push(month % this.shardCount);
    }

    return [...new Set(shards)]; // Remove duplicates
  }

  private createConnection(config: ShardConfig): DatabaseConnection {
    // Create database connection based on config
    return new DatabaseConnection(config);
  }
}

interface ShardConfig {
  shardId: number;
  host: string;
  port: number;
  database: string;
  username: string;
  password: string;
}

Advantages:

  • Maximum flexibility in sharding strategy
  • No need for specialized database features
  • Can optimize for specific access patterns

Disadvantages:

  • Application complexity increases
  • All applications must implement sharding logic
  • Cross-shard queries require manual orchestration

Database-managed sharding

Some databases offer built-in sharding capabilities that abstract the complexity from the application layer.

MongoDB Sharded Clusters:

javascript// MongoDB sharding configuration
sh.addShard("shard1.example.com:27017")
sh.addShard("shard2.example.com:27017")
sh.addShard("shard3.example.com:27017")

// Enable sharding on a database
sh.enableSharding("myapp")

// Shard a collection with hashed shard key
sh.shardCollection("myapp.users", { "user_id": "hashed" })

// Shard a collection with ranged shard key
sh.shardCollection("myapp.orders", { "created_at": 1 })

PostgreSQL Citus Extension:

sql-- Create distributed table
SELECT create_distributed_table('users', 'user_id');

-- Set distribution method
SELECT alter_distributed_table('users', 'distribution_method' => 'hash');

-- Create reference table (replicated to all shards)
SELECT create_reference_table('countries');

-- Create colocated table
SELECT create_distributed_table('orders', 'user_id');
SELECT alter_distributed_table('orders', 'colocate_with' => 'users');

Advantages:

  • Reduced application complexity
  • Database handles query routing
  • Built-in rebalancing and scaling

Disadvantages:

  • Less flexibility in sharding strategy
  • Vendor lock-in to specific database
  • May require specialized expertise

Sharding strategies

Hash-based sharding

Data is distributed based on a hash function applied to a shard key. This provides uniform distribution but makes range queries inefficient.

typescriptclass HashBasedSharding {
  private shardCount: number;

  constructor(shardCount: number) {
    this.shardCount = shardCount;
  }

  getShard(key: string): number {
    const hash = this.crc32(key);
    return hash % this.shardCount;
  }

  // CRC32 implementation
  private crc32(str: string): number {
    // Implementation of CRC32 hash function
    // In production, use a well-tested library
    let crc = 0 ^ -1;
    for (let i = 0; i < str.length; i++) {
      crc = (crc >>> 8) ^ crc32Table[(crc ^ str.charCodeAt(i)) & 0xff];
    }
    return (crc ^ -1) >>> 0;
  }
}

// Usage
const sharding = new HashBasedSharding(8);
const userShard = sharding.getShard('user_12345'); // 3

Characteristics:

  • Uniform data distribution across shards
  • No hotspots (assuming good hash function)
  • Cross-shard queries difficult
  • Re-sharding requires migrating significant data

Best for:

  • User-centric data (users, profiles, settings)
  • High cardinality keys
  • Point queries by shard key
  • Write-heavy workloads

Range-based sharding

Data is distributed based on ranges of shard key values. This enables efficient range queries but can create hotspots.

typescriptclass RangeBasedSharding {
  private ranges: Array<{min: number, max: number, shardId: number}>;

  constructor(ranges: Array<{min: number, max: number, shardId: number}>) {
    // Sort ranges by min value
    this.ranges = ranges.sort((a, b) => a.min - b.min);
  }

  getShard(key: number): number {
    for (const range of this.ranges) {
      if (key >= range.min && key <= range.max) {
        return range.shardId;
      }
    }
    throw new Error(`No shard found for key: ${key}`);
  }

  addShard(min: number, max: number, shardId: number) {
    this.ranges.push({ min, max, shardId });
    this.ranges.sort((a, b) => a.min - b.min);
  }
}

// Usage: Shard users by ID range
const sharding = new RangeBasedSharding([
  { min: 1, max: 1000000, shardId: 0 },
  { min: 1000001, max: 2000000, shardId: 1 },
  { min: 2000001, max: 3000000, shardId: 2 },
  { min: 3000001, max: 4000000, shardId: 3 },
]);

const userShard = sharding.getShard(2500500); // 2

Characteristics:

  • Efficient range queries within shard
  • Natural data clustering
  • Can create hotspots (uneven distribution)
  • Re-sharding splits existing ranges

Best for:

  • Time-series data (logs, metrics, events)
  • Data accessed by ranges (dates, IDs)
  • Analytics workloads
  • Applications with predictable data growth patterns

Directory-based sharding

A lookup service maintains the mapping between keys and shards. This provides maximum flexibility but adds operational complexity.

typescriptclass DirectoryBasedSharding {
  private directory: Map<string, number> = new Map();
  private shards: Map<number, DatabaseConnection> = new Map();

  constructor() {
    this.loadDirectory();
  }

  async getShard(key: string): Promise<DatabaseConnection> {
    const shardId = this.directory.get(key);
    if (!shardId) {
      throw new Error(`No shard mapping found for key: ${key}`);
    }

    const shard = this.shards.get(shardId);
    if (!shardId) {
      throw new Error(`Shard ${shardId} not found`);
    }

    return shard;
  }

  async assignKeyToShard(key: string, shardId: number): Promise<void> {
    this.directory.set(key, shardId);
    await this.saveDirectory();
  }

  async migrateKey(key: string, fromShardId: number, toShardId: number): Promise<void> {
    const fromShard = this.shards.get(fromShardId);
    const toShard = this.shards.get(toShardId);

    // Copy data to new shard
    const data = await fromShard.query('SELECT * FROM data WHERE key = ?', [key]);
    await toShard.insert('data', data);

    // Update directory
    this.directory.set(key, toShardId);
    await this.saveDirectory();

    // Delete from old shard (after confirmation)
    await fromShard.query('DELETE FROM data WHERE key = ?', [key]);
  }

  private async loadDirectory(): Promise<void> {
    // Load directory from storage (Redis, database, etc.)
    const entries = await this.loadFromStorage();
    entries.forEach(([key, shardId]) => {
      this.directory.set(key, shardId);
    });
  }

  private async saveDirectory(): Promise<void> {
    // Save directory to storage
    const entries = Array.from(this.directory.entries());
    await this.saveToStorage(entries);
  }
}

Characteristics:

  • Maximum flexibility in data placement
  • Can implement complex routing rules
  • Adds directory service as dependency
  • Directory becomes performance bottleneck

Best for:

  • Complex routing requirements
  • Multi-tenant applications with custom tenant placement
  • Geographic data distribution
  • A/B testing of sharding strategies

Geographical sharding

Data is distributed based on geographic location, typically for compliance or latency optimization.

typescriptclass GeoBasedSharding {
  private regionMapping: Map<string, string> = new Map(); // country -> region
  private shardMapping: Map<string, number> = new Map(); // region -> shard

  constructor() {
    this.initializeRegionMapping();
    this.initializeShardMapping();
  }

  getShard(userId: string, userCountry: string): number {
    const region = this.regionMapping.get(userCountry);
    if (!region) {
      return this.getShardMapping().get('default') || 0;
    }

    const shardId = this.shardMapping.get(region);
    if (shardId === undefined) {
      return this.getShardMapping().get('default') || 0;
    }

    return shardId;
  }

  private initializeRegionMapping(): void {
    this.regionMapping.set('US', 'us-east-1');
    this.regionMapping.set('BR', 'sa-east-1');
    this.regionMapping.set('DE', 'eu-central-1');
    this.regionMapping.set('JP', 'ap-northeast-1');
    // ... more mappings
  }

  private initializeShardMapping(): void {
    this.shardMapping.set('us-east-1', 0);
    this.shardMapping.set('sa-east-1', 1);
    this.shardMapping.set('eu-central-1', 2);
    this.shardMapping.set('ap-northeast-1', 3);
    this.shardMapping.set('default', 0);
  }

  private getShardMapping(): Map<string, number> {
    return this.shardMapping;
  }
}

// Usage
const sharding = new GeoBasedSharding();
const userShard = sharding.getShard('user_123', 'BR'); // 1

Characteristics:

  • Optimizes for latency
  • Supports data residency requirements
  • Requires user location tracking
  • May create uneven shard sizes

Best for:

  • Global applications
  • Compliance requirements (GDPR, data sovereignty)
  • Latency-sensitive applications
  • Multi-region deployments

Cross-shard queries and join operations

Sharding introduces complexity for queries that span multiple shards.

Distributed transactions

typescriptclass DistributedTransaction {
  private shards: Map<number, DatabaseConnection>;
  private transactions: Map<number, any> = new Map();

  async execute(query: string, shardIds: number[]): Promise<void> {
    // Begin transaction on all participating shards
    for (const shardId of shardIds) {
      const shard = this.shards.get(shardId);
      const tx = await shard.beginTransaction();
      this.transactions.set(shardId, tx);
    }

    try {
      // Execute query on all shards
      const results = await Promise.all(
        shardIds.map(async (shardId) => {
          const tx = this.transactions.get(shardId);
          return tx.execute(query);
        })
      );

      // Commit all transactions
      await Promise.all(
        shardIds.map(async (shardId) => {
          const tx = this.transactions.get(shardId);
          return tx.commit();
        })
      );

      return results;
    } catch (error) {
      // Rollback all transactions
      await Promise.all(
        shardIds.map(async (shardId) => {
          const tx = this.transactions.get(shardId);
          try {
            await tx.rollback();
          } catch (rollbackError) {
            console.error(`Rollback failed on shard ${shardId}:`, rollbackError);
          }
        })
      );
      throw error;
    }
  }
}

Application-side joins

typescriptasync function getOrdersWithUserSharded(orderId: string): Promise<any> {
  // Get order from orders shard
  const orderShardId = getShardIdForOrders(orderId);
  const order = await shards.get(orderShardId).query(
    'SELECT * FROM orders WHERE id = ?',
    [orderId]
  );

  // Get user from users shard
  const userShardId = getShardIdForUsers(order.user_id);
  const user = await shards.get(userShardId).query(
    'SELECT * FROM users WHERE id = ?',
    [order.user_id]
  );

  return { ...order, user };
}

async function getRecentOrdersForUser(userId: string, limit: number = 10): Promise<any[]> {
  // Query orders shard
  const orderShardId = getShardIdForOrdersByUser(userId);
  const orders = await shards.get(orderShardId).query(
    'SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT ?',
    [userId, limit]
  );

  return orders;
}

Denormalization strategies

To minimize cross-shard queries, denormalize data:

typescript// Store redundant data to avoid cross-shard joins
interface Order {
  id: string;
  user_id: string;
  // Denormalized user data
  user_name: string;
  user_email: string;
  total: number;
  created_at: Date;
}

async function createOrderWithDenormalizedUser(orderData: OrderData, userId: string): Promise<Order> {
  // Fetch user from users shard
  const userShardId = getShardIdForUsers(userId);
  const user = await shards.get(userShardId).query(
    'SELECT id, name, email FROM users WHERE id = ?',
    [userId]
  );

  // Create order with denormalized user data
  const orderShardId = getShardIdForOrders(orderData.id);
  const order = await shards.get(orderShardId).insert('orders', {
    id: orderData.id,
    user_id: userId,
    user_name: user.name,
    user_email: user.email,
    total: orderData.total,
    created_at: new Date()
  });

  return order;
}

Re-sharding strategies

Consistent hashing

Consistent hashing minimizes data movement when adding or removing shards:

typescriptclass ConsistentHash {
  private ring: Map<number, string> = new Map(); // hash -> shard
  private sortedHashes: number[] = [];
  private virtualNodes: number = 150; // Virtual nodes per physical shard

  addShard(shardId: string): void {
    // Add virtual nodes for load balancing
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualNodeKey = `${shardId}:${i}`;
      const hash = this.hashFunction(virtualNodeKey);
      this.ring.set(hash, shardId);
      this.sortedHashes.push(hash);
    }

    this.sortedHashes.sort((a, b) => a - b);
  }

  removeShard(shardId: string): void {
    // Remove all virtual nodes for this shard
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualNodeKey = `${shardId}:${i}`;
      const hash = this.hashFunction(virtualNodeKey);
      this.ring.delete(hash);

      const index = this.sortedHashes.indexOf(hash);
      if (index > -1) {
        this.sortedHashes.splice(index, 1);
      }
    }
  }

  getShard(key: string): string {
    const hash = this.hashFunction(key);

    // Find first node on the ring with hash >= key hash
    const index = this.findFirstNodeGreaterOrEqual(hash);

    // Wrap around if necessary
    const nodeHash = this.sortedHashes[index % this.sortedHashes.length];
    return this.ring.get(nodeHash) || '';
  }

  private findFirstNodeGreaterOrEqual(hash: number): number {
    let left = 0;
    let right = this.sortedHashes.length - 1;

    while (left <= right) {
      const mid = Math.floor((left + right) / 2);

      if (this.sortedHashes[mid] >= hash) {
        if (mid === 0 || this.sortedHashes[mid - 1] < hash) {
          return mid;
        }
        right = mid - 1;
      } else {
        left = mid + 1;
      }
    }

    return 0;
  }

  private hashFunction(key: string): number {
    // Use a good hash function (MD5, SHA-1, or murmurhash)
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      const char = key.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash;
    }
    return Math.abs(hash);
  }
}

Dual-write migration strategy

Gradually migrate data to new shards:

typescriptclass DualWriteMigrator {
  private sourceShard: DatabaseConnection;
  private targetShard: DatabaseConnection;
  private migrationComplete: boolean = false;

  async migrateKey(key: string): Promise<void> {
    // 1. Start dual writes
    await this.enableDualWrites(key);

    // 2. Backfill existing data
    await this.backfillData(key);

    // 3. Verify data consistency
    const isConsistent = await this.verifyConsistency(key);

    if (isConsistent) {
      // 4. Switch read target
      await this.switchReadTarget(key);

      // 5. Stop writes to source
      await this.disableSourceWrites(key);
    }
  }

  async enableDualWrites(key: string): Promise<void> {
    // Write to both source and target
    this.sourceShard.insert('data', { key, value: 'write_enabled' });
    this.targetShard.insert('data', { key, value: 'write_enabled' });
  }

  async backfillData(key: string): Promise<void> {
    // Copy all existing data
    const data = await this.sourceShard.query('SELECT * FROM data WHERE key = ?', [key]);
    await this.targetShard.insert('data', data);
  }

  async verifyConsistency(key: string): Promise<boolean> {
    const sourceData = await this.sourceShard.query('SELECT * FROM data WHERE key = ?', [key]);
    const targetData = await this.targetShard.query('SELECT * FROM data WHERE key = ?', [key]);

    return JSON.stringify(sourceData) === JSON.stringify(targetData);
  }

  async switchReadTarget(key: string): Promise<void> {
    // Update routing to read from target
    await this.updateRouting(key, 'target');
  }

  async disableSourceWrites(key: string): Promise<void> {
    // Stop writing to source shard
    await this.sourceShard.query('UPDATE data SET write_enabled = false WHERE key = ?', [key]);
  }

  async updateRouting(key: string, target: string): Promise<void> {
    // Update routing configuration
    // This could be in a config store, database, or service discovery
  }
}

Operational considerations

Monitoring and observability

Sharded systems require comprehensive monitoring:

typescriptclass ShardedDatabaseMonitor {
  private shards: Map<number, DatabaseConnection>;
  private metrics: Map<number, ShardMetrics> = new Map();

  async collectMetrics(): Promise<ShardedSystemMetrics> {
    const shardMetrics = await Promise.all(
      Array.from(this.shards.entries()).map(async ([shardId, shard]) => {
        const metrics = await this.collectShardMetrics(shardId, shard);
        this.metrics.set(shardId, metrics);
        return { shardId, metrics };
      })
    );

    return {
      shards: shardMetrics,
      system: this.calculateSystemMetrics(shardMetrics)
    };
  }

  private async collectShardMetrics(shardId: number, shard: DatabaseConnection): Promise<ShardMetrics> {
    const [
      connectionCount,
      queryLatency,
      dataSize,
      writeThroughput,
      readThroughput
    ] = await Promise.all([
      shard.getConnectionCount(),
      shard.getAverageQueryLatency(),
      shard.getDataSize(),
      shard.getWriteThroughput(),
      shard.getReadThroughput()
    ]);

    return {
      shardId,
      connectionCount,
      queryLatency,
      dataSize,
      writeThroughput,
      readThroughput,
      healthStatus: this.evaluateHealthStatus({
        connectionCount,
        queryLatency,
        dataSize,
        writeThroughput,
        readThroughput
      })
    };
  }

  private evaluateHealthStatus(metrics: Partial<ShardMetrics>): string {
    if (metrics.queryLatency > 1000) return 'degraded';
    if (metrics.connectionCount > 10000) return 'degraded';
    return 'healthy';
  }

  private calculateSystemMetrics(shardMetrics: Array<{shardId: number, metrics: ShardMetrics}>): SystemMetrics {
    const totalConnections = shardMetrics.reduce((sum, { metrics }) => sum + metrics.connectionCount, 0);
    const avgLatency = shardMetrics.reduce((sum, { metrics }) => sum + metrics.queryLatency, 0) / shardMetrics.length;
    const totalDataSize = shardMetrics.reduce((sum, { metrics }) => sum + metrics.dataSize, 0);

    return {
      totalConnections,
      averageLatency: avgLatency,
      totalDataSize,
      shardCount: shardMetrics.length,
      healthyShards: shardMetrics.filter(({ metrics }) => metrics.healthStatus === 'healthy').length
    };
  }
}

interface ShardMetrics {
  shardId: number;
  connectionCount: number;
  queryLatency: number;
  dataSize: number;
  writeThroughput: number;
  readThroughput: number;
  healthStatus: string;
}

interface SystemMetrics {
  totalConnections: number;
  averageLatency: number;
  totalDataSize: number;
  shardCount: number;
  healthyShards: number;
}

Backup and disaster recovery

Sharding complicates backup and recovery:

typescriptclass ShardedBackupStrategy {
  private shards: Map<number, DatabaseConnection>;

  async createConsistentBackup(backupId: string): Promise<void> {
    // 1. Freeze writes across all shards
    await this.freezeAllShards();

    try {
      // 2. Create backup of each shard
      const backups = await Promise.all(
        Array.from(this.shards.entries()).map(async ([shardId, shard]) => {
          return this.createShardBackup(shardId, shard, backupId);
        })
      );

      // 3. Record backup metadata
      await this.recordBackupMetadata(backupId, backups);

      console.log(`Backup ${backupId} completed successfully`);
    } finally {
      // 4. Resume writes
      await this.resumeAllShards();
    }
  }

  private async createShardBackup(shardId: number, shard: DatabaseConnection, backupId: string): Promise<BackupMetadata> {
    const timestamp = Date.now();
    const backupLocation = `backups/${backupId}/shard_${shardId}_${timestamp}.sql`;

    await shard.dump(backupLocation);
    const checksum = await this.calculateChecksum(backupLocation);

    return {
      shardId,
      backupId,
      timestamp,
      location: backupLocation,
      checksum,
      dataSize: await this.getFileSize(backupLocation)
    };
  }

  async restoreFromBackup(backupId: string): Promise<void> {
    const backupMetadata = await this.getBackupMetadata(backupId);

    for (const shardBackup of backupMetadata.shards) {
      await this.restoreShardBackup(shardBackup);
      await this.verifyShardBackup(shardBackup);
    }

    console.log(`Restore from backup ${backupId} completed successfully`);
  }

  private async verifyShardBackup(shardBackup: BackupMetadata): Promise<void> {
    const actualChecksum = await this.calculateChecksum(shardBackup.location);
    if (actualChecksum !== shardBackup.checksum) {
      throw new Error(`Checksum mismatch for shard ${shardBackup.shardId}`);
    }
  }
}

Failure handling and resilience

typescriptclass ShardedDatabaseResilience {
  private shards: Map<number, DatabaseConnection>;
  private retryPolicy: RetryPolicy;
  private circuitBreaker: CircuitBreaker;

  async queryWithRetry(query: string, params: any[], shardId: number): Promise<any> {
    return this.retryPolicy.execute(async () => {
      if (this.circuitBreaker.isOpen(shardId)) {
        throw new Error(`Shard ${shardId} circuit breaker is open`);
      }

      try {
        const result = await this.executeQuery(query, params, shardId);
        this.circuitBreaker.recordSuccess(shardId);
        return result;
      } catch (error) {
        this.circuitBreaker.recordFailure(shardId);
        throw error;
      }
    });
  }

  async executeQuery(query: string, params: any[], shardId: number): Promise<any> {
    const shard = this.shards.get(shardId);
    if (!shard) {
      throw new Error(`Shard ${shardId} not found`);
    }

    return await shard.query(query, params);
  }

  async failoverToReplica(shardId: number): Promise<DatabaseConnection> {
    const primaryShard = this.shards.get(shardId);
    const replicaShardId = this.findAvailableReplica(shardId);

    if (!replicaShardId) {
      throw new Error(`No available replica for shard ${shardId}`);
    }

    // Redirect traffic to replica
    await this.redirectTraffic(shardId, replicaShardId);

    return this.shards.get(replicaShardId);
  }

  private findAvailableReplica(shardId: number): number | null {
    // Implementation to find available replica
    // This could be based on health checks, load, or geographic proximity
    return null;
  }

  private async redirectTraffic(fromShardId: number, toShardId: number): Promise<void> {
    // Update routing to redirect traffic
    console.log(`Redirecting traffic from shard ${fromShardId} to ${toShardId}`);
  }
}

Decision framework

Evaluate sharding necessity

Questions to ask before sharding:

  1. Have you optimized your single-instance database?
  • Indexes properly tuned?
  • Query performance optimized?
  • Caching implemented?
  • Read replicas deployed?
  1. What is your primary scaling constraint?
  • Write throughput → Sharding helps
  • Read throughput → Replication may suffice
  • Storage capacity → Sharding or object storage
  1. What is your data growth trajectory?
  • Predictable growth → Range-based sharding
  • Unpredictable growth → Hash-based sharding
  1. What are your query patterns?
  • Mostly point queries → Hash-based sharding
  • Range queries important → Range-based sharding
  • Complex joins → Consider application-level optimization

Choose sharding strategy

StrategyBest ForTrade-offs
Hash-basedUser-centric data, high cardinalityUniform distribution, difficult range queries
Range-basedTime-series, analytics, predictable growthEfficient range queries, potential hotspots
Directory-basedComplex routing, multi-tenantMaximum flexibility, directory dependency
GeographicalGlobal apps, compliance requirementsLatency optimization, uneven shard sizes

Conclusion

Database sharding is a powerful technique for scaling data-intensive applications, but it introduces significant complexity. The decision to shard should be based on clear evidence that single-instance scaling is insufficient, not as a premature optimization.

Successful sharding requires careful planning, ongoing monitoring, and robust operational practices. Start with the simplest sharding strategy that meets your needs, and evolve as requirements change. The goal isn't to achieve architectural purity—it's to build a system that scales predictably while remaining maintainable.

Practical closing question: What is the primary scaling bottleneck in your current database deployment, and what would be the impact of a sharded architecture on your team's operational capabilities?


Planning a database scaling strategy and need expert guidance on sharding architecture and implementation? Talk to Imperialis specialists about designing a sharding strategy that matches your scale, requirements, and team capabilities.

Sources

Related reading