Knowledge

Distributed Locking: Beyond Simple Redlock

How to implement robust distributed locks in production with etcd, Consul and advanced patterns.

3/12/20267 min readKnowledge
Distributed Locking: Beyond Simple Redlock

Executive summary

How to implement robust distributed locks in production with etcd, Consul and advanced patterns.

Last updated: 3/12/2026

Introduction: The lie of "simple lock" in distributed systems

In distributed systems, locks are not simple. A lock that works in single-process (mutex, semaphore, monitor) fails catastrophically when extended to multiple machines. Networks fail, clocks desynchronize, nodes crash, and messages are delivered out of order.

The Redlock pattern (popularized by Redis) is often implemented incorrectly. Martin Kleppmann in his article "How to do distributed locking" demonstrated that simplified Redlock implementations can lead to race conditions and duplicate locks.

For production systems in 2026, the decision isn't "if to use distributed locking", but "which algorithm and which data store provides the correct guarantees for my use case".

The distributed locking problem

Why local locks don't work

In multi-instance systems, in-memory locks in each process are insufficient:

typescript// PROBLEM: Local lock doesn't protect across instances
class InMemoryLock {
  private locked = false;

  async acquire(): Promise<boolean> {
    if (this.locked) return false;
    this.locked = true;
    return true;
  }

  release(): void {
    this.locked = false;
  }
}

// Instance A: lock.acquire() → true
// Instance B: lock.acquire() → true (BUT SHOULD BE FALSE!)
// Race condition: both instances execute critical code

Mutual exclusion guarantee

A correct distributed lock must guarantee:

  1. Mutual Exclusion: At most one process can hold the lock
  2. Freedom from Deadlock: The lock is eventually released even if the holding process dies
  3. Liveness: If the lock is available, a process should be able to acquire it
  4. Fairness (optional): First to ask, first to receive

Implementation patterns

Pattern 1: Leases with TTL

The simplest approach is to use leases with time-to-live (TTL). If the process dies, the lease expires automatically:

typescriptinterface Lease {
  acquire(key: string, ttl: number): Promise<LeaseToken>;
  release(token: LeaseToken): Promise<void>;
  renew(token: LeaseToken, ttl: number): Promise<void>;
}

async function withLease<T>(
  lease: Lease,
  key: string,
  fn: () => Promise<T>
): Promise<T> {
  const token = await lease.acquire(key, 30000); // 30 seconds
  const renewInterval = setInterval(() => {
    lease.renew(token, 30000);
  }, 15000); // Renew every 15s

  try {
    return await fn();
  } finally {
    clearInterval(renewInterval);
    await lease.release(token);
  }
}

Trade-offs:

  • ✅ Simple to implement
  • ✅ Protects against dead processes
  • ❌ Lock stays held if process crashes before TTL expires
  • ❌ Race condition if TTL expires but process is still running

Pattern 2: Fencing Tokens (Martin Kleppmann)

Fencing tokens solve the expired lock problem by adding a sequential number that always increases:

typescriptinterface DistributedLock {
  acquire(): Promise<{ token: number; lease: Lease }>;
  release(token: number): Promise<void>;
}

async function withFencing<T>(
  lock: DistributedLock,
  resource: string,
  fn: (fencingToken: number) => Promise<T>
): Promise<T> {
  const { token, lease } = await lock.acquire();

  try {
    return await fn(token);
  } finally {
    await lease.release();
  }
}

// USAGE: Resource validates token is higher than last seen
class StorageSystem {
  private lastFencingToken = 0;

  async write(data: string, fencingToken: number): Promise<void> {
    if (fencingToken < this.lastFencingToken) {
      throw new Error('Request from expired lock - ignoring');
    }

    this.lastFencingToken = fencingToken;
    // Execute write
  }
}

Why it works:

  • If lock expires and new lock is acquired with token 101, any request with token 100 is rejected
  • The resource (database, filesystem) validates token before executing
  • Guarantees processes with expired locks cannot cause damage

Pattern 3: Redlock (Multi-instance Redis)

Redlock uses multiple Redis instances to ensure lock is distributed:

typescriptclass Redlock {
  constructor(
    private clients: Redis[],
    private quorum: number
  ) {}

  async acquire(
    key: string,
    ttl: number
  ): Promise<Lock | null> {
    const value = crypto.randomUUID();
    const startTime = Date.now();

    // Try to acquire lock on all instances
    const successes = await Promise.allSettled(
      this.clients.map(client =>
        client.set(key, value, 'PX', ttl, 'NX')
      )
    );

    const acquiredCount = successes.filter(
      s => s.status === 'fulfilled' && s.value === 'OK'
    ).length;

    if (acquiredCount < this.quorum) {
      // Failed, release acquired locks
      await this.unlockAll(key, value);
      return null;
    }

    // Check if lock time exceeded TTL
    const elapsed = Date.now() - startTime;
    if (elapsed >= ttl) {
      await this.unlockAll(key, value);
      return null;
    }

    return { key, value, startTime };
  }

  private async unlockAll(key: string, value: string): Promise<void> {
    await Promise.all(
      this.clients.map(client =>
        client.eval(
          'if redis.call("get", KEYS[1]) == ARGV[1] then return redis.call("del", KEYS[1]) else return 0 end',
          1,
          key,
          value
        )
      )
    );
  }
}

Trade-offs:

  • ✅ High performance (memory-based)
  • ✅ Availability with quorum
  • ❌ Complex to implement correctly
  • ❌ Depends on synchronized clocks between Redis instances
  • ❌ Race conditions if TTL expires and lock is quickly re-acquired

Pattern 4: etcd Leases with Raft

etcd uses the Raft consensus algorithm to offer locks with strong guarantees:

typescriptimport { Lock, Etcd3 } from 'etcd3';

const client = new Etcd3({ hosts: 'localhost:2379' });

const lock = new Lock(
  client,
  'my-distributed-lock',
  {
    ttl: 30, // seconds
    // etcd guarantees lock is maintained with heartbeats
  }
);

async function withEtcdLock<T>(fn: () => Promise<T>): Promise<T> {
  await lock.acquire();

  try {
    return await fn();
  } finally {
    await lock.release();
  }
}

etcd advantages:

  • Strong consensus guarantees via Raft
  • Automatic lease renewal (no TTL race condition)
  • Integrated discovery service
  • Watch API for notifications

Pattern 5: Consul Sessions

Consul uses sessions with TTL for distributed locking:

typescriptimport Consul from 'consul';

const consul = new Consul();

async function withConsulLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
  const session = await consul.session.create({
    TTL: '30s',
    Behavior: 'delete',
  });

  try {
    // Try to acquire lock
    const locked = await consul.lock.acquire({
      key: `service/lock/${key}`,
      session: session.ID,
    });

    if (!locked) {
      throw new Error('Failed to acquire lock');
    }

    return await fn();
  } finally {
    await consul.session.destroy(session.ID);
  }
}

Consul advantages:

  • Integrated with service discovery
  • Native health checks
  • KV store with versioning
  • Multi-datacenter support

When to use distributed locking

Appropriate use cases

  1. Scheduled job coordination: Ensure only one instance executes a cron job
  2. Resource allocation: Limit access to scarce resources (API quotas, hardware)
  3. Leader election: Designate leader for processing cluster
  4. Shared state mutation: Modify data that cannot be atomically updated

Inappropriate use cases

  1. High frequency lock/unlock: Distributed locking is expensive (network round-trip)
  2. Long-held locks: Long locks increase chance of timeout and race conditions
  3. Performance-critical paths: If lock latency is critical, redesign architecture

Alternatives to locking

Idempotency

Instead of locking, make operations idempotent:

typescript// WITHOUT LOCK: idempotent operation
async function createPayment(paymentId: string, amount: number) {
  const existing = await db.payments.find({ id: paymentId });

  if (existing) {
    return existing; // Already processed, return result
  }

  return await db.payments.insert({ id: paymentId, amount });
}

Optimistic Concurrency Control

Use versioning to detect conflicts:

typescriptasync function updateDocument(
  docId: string,
  expectedVersion: number,
  updates: Partial<Document>
): Promise<void> {
  const result = await db.documents.updateOne(
    { id: docId, version: expectedVersion },
    { $set: { ...updates, version: expectedVersion + 1 } }
  );

  if (result.modifiedCount === 0) {
    throw new Error('Document was modified by another process');
  }
}

Message Queue with Exactly-Once

Use message queue with exactly-once guarantees:

typescript// Use Kafka with idempotent consumer
consumer.subscribe({ topic: 'orders' });

await consumer.run({
  eachMessage: async ({ message }) => {
    const orderId = message.key.toString();

    // Process idempotently
    await processOrder(orderId, message.value);
  },
});

Implementation checklist

  • [ ] Lock has TTL/lease to handle dead processes
  • [ ] Lock release is idempotent (can be called multiple times)
  • [ ] Uses fencing tokens if TTL race conditions are acceptable
  • [ ] Lock acquisition has timeout (doesn't wait indefinitely)
  • [ ] Implements retry with exponential backoff
  • [ ] Monitors lock wait time and lock duration
  • [ ] Has fallback or alert if lock consistently fails

Conclusion

Distributed locking in 2026 is a mature tool with multiple valid implementations: etcd (Raft consensus), Consul (sessions), Redis (Redlock), and custom solutions with fencing tokens.

Architecture decisions should be based on:

  • Consistency guarantees: Do you tolerate occasional race conditions?
  • Performance: What lock acquisition latency is acceptable?
  • Operational complexity: Can your team operate etcd/Consul?
  • Scalability: How many instances compete for the lock?

For most use cases, starting with etcd or Consul offers the best balance of strong guarantees and manageable complexity. Redlock is appropriate when performance is critical and you understand the consistency trade-offs.

Alternatives like idempotency and optimistic concurrency should be considered first — often the problem can be solved without distributed locking.


Need to implement distributed locking or redesign architecture to eliminate need for locks? Talk to Imperialis web specialists to design and implement solution.

Sources

Related reading