Knowledge

Rate Limiting in Production: Algorithms, Trade-offs, and Implementation

Protecting your APIs from abuse and overload while balancing user experience with system stability.

3/12/20268 min readKnowledge
Rate Limiting in Production: Algorithms, Trade-offs, and Implementation

Executive summary

Protecting your APIs from abuse and overload while balancing user experience with system stability.

Last updated: 3/12/2026

The necessity of rate limiting

Every public API will be abused. This is not speculation—it is the operational reality of exposing services on the internet. Scrapers, automated bots, malicious actors, and even well-intentioned users will send more requests than your infrastructure can handle.

Rate limiting serves three critical purposes:

  1. Protection against abuse: Prevent brute-force attacks, credential stuffing, and API scraping
  2. Infrastructure stability: Ensure services don't collapse under unexpected load spikes
  3. Fairness: Distribute capacity equitably across legitimate users

The challenge is implementing rate limiting that protects your systems without blocking legitimate users or creating poor user experience. The right algorithm, storage strategy, and configuration matter significantly.

Rate limiting algorithms

Different algorithms solve different problems. Understanding their trade-offs is essential for choosing the right approach.

Token Bucket Algorithm

The token bucket algorithm allows bursts of requests up to a maximum capacity while enforcing a long-term average rate.

How it works:

  • Tokens are added to a bucket at a fixed rate
  • Each request consumes one token
  • If the bucket is empty, requests are rejected
  • The bucket has a maximum capacity (burst allowance)
typescriptclass TokenBucketRateLimiter {
  constructor(
    private capacity: number,        // Maximum tokens
    private refillRate: number,     // Tokens per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  private tokens: number;
  private lastRefill: number;

  async tryRequest(): Promise<boolean> {
    this.refill();

    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }

    return false;
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;

    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }

  getRemaining(): number {
    this.refill();
    return this.tokens;
  }
}

// Usage: 100 requests per second, burst of 200
const rateLimiter = new TokenBucketRateLimiter(200, 100);

When to use: APIs that need to allow burst traffic (e.g., file uploads, bulk operations) while maintaining overall rate limits.

Advantages:

  • Allows legitimate bursts
  • Smooth throttling behavior
  • Memory-efficient

Disadvantages:

  • Tokens can accumulate and enable large bursts if underutilized

Leaky Bucket Algorithm

The leaky bucket processes requests at a constant rate, queueing excess requests until the queue overflows.

How it works:

  • Requests are added to a queue
  • Requests leave the queue at a fixed rate
  • If the queue is full, requests are rejected
typescriptclass LeakyBucketRateLimiter {
  constructor(
    private queueSize: number,       // Maximum queue size
    private leakRate: number,        // Requests per second
  ) {
    this.queue = [];
    this.lastLeak = Date.now();
  }

  private queue: any[];
  private lastLeak: number;

  async tryRequest(): Promise<boolean> {
    this.leak();

    if (this.queue.length < this.queueSize) {
      this.queue.push(Date.now());
      return true;
    }

    return false;
  }

  private leak() {
    const now = Date.now();
    const elapsed = (now - this.lastLeak) / 1000;
    const requestsToLeak = Math.floor(elapsed * this.leakRate);

    this.queue.splice(0, requestsToLeak);
    this.lastLeak = now;
  }
}

// Usage: Process 50 requests per second, queue up to 100
const rateLimiter = new LeakyBucketRateLimiter(100, 50);

When to use: APIs that need strict, consistent output rates (e.g., message queues, streaming data).

Advantages:

  • Consistent output rate
  • Handles traffic smoothing well

Disadvantages:

  • No burst allowance
  • Queue management complexity

Sliding Window Log Algorithm

The sliding window log tracks every request within a time window, providing accurate rate limiting.

How it works:

  • Store timestamp of each request
  • Count requests within the sliding time window
  • Reject if count exceeds threshold
typescriptclass SlidingWindowLogRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {
    this.requests = [];
  }

  private requests: number[];

  async tryRequest(): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - this.windowMs;

    // Remove requests outside the window
    this.requests = this.requests.filter(t => t > windowStart);

    if (this.requests.length < this.maxRequests) {
      this.requests.push(now);
      return true;
    }

    return false;
  }

  getRetryAfter(): number {
    const oldestRequest = this.requests[0];
    const windowStart = Date.now() - this.windowMs;
    return Math.max(0, oldestRequest - windowStart);
  }
}

// Usage: Max 100 requests per 60 seconds
const rateLimiter = new SlidingWindowLogRateLimiter(100, 60000);

When to use: APIs requiring accurate per-window rate limiting with user-friendly retry-after headers.

Advantages:

  • Accurate rate limiting
  • Easy to calculate retry-after

Disadvantages:

  • Memory-intensive for high traffic
  • Performance overhead from log management

Fixed Window Counter Algorithm

The fixed window counter divides time into fixed intervals and resets at interval boundaries.

How it works:

  • Divide time into fixed windows (e.g., per minute)
  • Increment counter for each request
  • Reset counter at window boundary
typescriptclass FixedWindowRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {
    this.currentWindow = Math.floor(Date.now() / this.windowMs);
    this.requestCount = 0;
  }

  private currentWindow: number;
  private requestCount: number;

  async tryRequest(): Promise<boolean> {
    const now = Date.now();
    const windowNumber = Math.floor(now / this.windowMs);

    if (windowNumber !== this.currentWindow) {
      this.currentWindow = windowNumber;
      this.requestCount = 0;
    }

    if (this.requestCount < this.maxRequests) {
      this.requestCount++;
      return true;
    }

    return false;
  }

  getRetryAfter(): number {
    const now = Date.now();
    const currentWindow = Math.floor(now / this.windowMs);
    const nextWindow = currentWindow + 1;
    const nextWindowStart = nextWindow * this.windowMs;

    return Math.max(0, nextWindowStart - now);
  }
}

// Usage: Max 1000 requests per minute
const rateLimiter = new FixedWindowRateLimiter(1000, 60000);

When to use: Simple rate limiting scenarios where minor inaccuracies at window boundaries are acceptable.

Advantages:

  • Simple implementation
  • Minimal memory overhead

Disadvantages:

  • Spike at window boundaries (double rate possible)
  • Less accurate than sliding window

Algorithm comparison

AlgorithmAccuracyBurst SupportMemoryComplexityBest For
Token BucketHighYesLowMediumAPIs needing burst allowance
Leaky BucketMediumNoMediumMediumConsistent output rates
Sliding WindowVery HighLimitedHighHighAccurate per-window limits
Fixed WindowLowNoLowLowSimple rate limiting

Distributed rate limiting

In microservices architectures, rate limiting must work across multiple instances. This introduces two challenges:

Storage backend

Rate limiting state must be stored in a shared, distributed backend:

typescript// Redis-based sliding window implementation
import { createClient } from 'redis';

const redis = createClient();

class DistributedSlidingWindowRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {}

  async tryRequest(key: string): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - this.windowMs;
    const pipeline = redis.multi();

    // Remove old requests
    pipeline.zRemRangeByScore(key, 0, windowStart);

    // Count current requests
    pipeline.zCard(key);

    // Add new request
    pipeline.zAdd(key, { score: now, value: now.toString() });

    // Set expiration
    pipeline.expire(key, Math.ceil(this.windowMs / 1000));

    const results = await pipeline.exec();
    const count = results[1] as number;

    return count < this.maxRequests;
  }
}

Synchronization overhead

Distributed rate limiting introduces network latency on every request. Mitigate this with:

  1. Redis cluster: Use Redis Cluster for horizontal scaling
  2. Local caching: Cache rate limit decisions locally for sub-second windows
  3. Async updates: Update rate limit state asynchronously

Rate limiting strategies by use case

API endpoint rate limiting

typescript// Express middleware for endpoint rate limiting
import { Router } from 'express';

const router = Router();

// Different limits for different endpoints
const publicApiLimiter = new TokenBucketRateLimiter(100, 10);
const authApiLimiter = new TokenBucketRateLimiter(20, 2);
const premiumApiLimiter = new TokenBucketRateLimiter(1000, 100);

router.use('/api/public', async (req, res, next) => {
  const allowed = await publicApiLimiter.tryRequest();
  if (!allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  next();
});

router.use('/api/auth', async (req, res, next) => {
  const allowed = await authApiLimiter.tryRequest();
  if (!allowed) {
    const retryAfter = authApiLimiter.getRetryAfter();
    res.setHeader('Retry-After', Math.ceil(retryAfter / 1000));
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  next();
});

User-based rate limiting

typescript// Rate limit by user ID or API key
class UserRateLimiter {
  private limiters: Map<string, RateLimiter> = new Map();

  async tryRequest(userId: string): Promise<boolean> {
    if (!this.limiters.has(userId)) {
      this.limiters.set(userId, new SlidingWindowLogRateLimiter(100, 60000));
    }

    const limiter = this.limiters.get(userId)!;
    return await limiter.tryRequest();
  }
}

IP-based rate limiting

typescript// Rate limit by IP address (use with caution)
import ip from 'ip';

class IpRateLimiter {
  private limiters: Map<string, RateLimiter> = new Map();

  async tryRequest(reqIp: string): Promise<boolean> {
    // Normalize IP addresses (handle IPv6 vs IPv4)
    const normalizedIp = ip.normalize(reqIp);

    if (!this.limiters.has(normalizedIp)) {
      this.limiters.set(normalizedIp, new SlidingWindowLogRateLimiter(50, 60000));
    }

    const limiter = this.limiters.get(normalizedIp)!;
    return await limiter.tryRequest();
  }
}

Caution: IP-based rate limiting has issues with NAT, proxies, and shared networks. Use as a defense-in-depth measure, not as the primary strategy.

Response headers and user experience

Rate limiting should be transparent to clients through proper HTTP headers:

typescriptfunction setRateLimitHeaders(res: Response, limiter: RateLimiter) {
  const remaining = limiter.getRemaining();
  const limit = limiter.getLimit();
  const reset = limiter.getResetTime();

  res.setHeader('X-RateLimit-Limit', limit);
  res.setHeader('X-RateLimit-Remaining', Math.max(0, remaining));
  res.setHeader('X-RateLimit-Reset', reset);

  if (remaining <= 0) {
    const retryAfter = limiter.getRetryAfter();
    res.setHeader('Retry-After', Math.ceil(retryAfter / 1000));
  }
}

Standard headers:

  • X-RateLimit-Limit: Maximum requests per window
  • X-RateLimit-Remaining: Requests remaining in current window
  • X-RateLimit-Reset: Unix timestamp when window resets
  • Retry-After: Seconds until retry (when limit exceeded)

Rate limiting tiers

Implement tiered rate limiting based on user subscription or API key:

typescriptinterface RateLimitTier {
  requestsPerMinute: number;
  burstCapacity: number;
}

const rateLimitTiers: Record<string, RateLimitTier> = {
  free: { requestsPerMinute: 10, burstCapacity: 20 },
  basic: { requestsPerMinute: 100, burstCapacity: 200 },
  pro: { requestsPerMinute: 1000, burstCapacity: 2000 },
  enterprise: { requestsPerMinute: 10000, burstCapacity: 20000 },
};

class TieredRateLimiter {
  private limiters: Map<string, TokenBucketRateLimiter> = new Map();

  async tryRequest(userId: string, tier: string): Promise<boolean> {
    if (!this.limiters.has(userId)) {
      const config = rateLimitTiers[tier] || rateLimitTiers.free;
      this.limiters.set(userId, new TokenBucketRateLimiter(
        config.burstCapacity,
        config.requestsPerMinute / 60
      ));
    }

    const limiter = this.limiters.get(userId)!;
    return await limiter.tryRequest();
  }
}

Common anti-patterns

Anti-pattern 1: Rate limiting only at the edge

Rate limiting at the CDN or API Gateway level is insufficient for protection against internal abuse or compromised accounts. Implement application-level rate limiting as well.

Anti-pattern 2: Blocking legitimate burst traffic

Using fixed window algorithms for APIs that legitimately experience burst traffic (e.g., after a user completes a form submission) creates poor user experience. Use token bucket or sliding window algorithms instead.

Anti-pattern 3: Ignoring rate limit headers

Clients that ignore rate limit headers and retry immediately create thundering herd effects. Implement exponential backoff with jitter on the client side.

Anti-pattern 4: Rate limiting without monitoring

Rate limiting without comprehensive monitoring means you won't know if legitimate users are being blocked or if attackers are finding ways around your limits.

Monitoring and observability

Track rate limiting metrics to ensure your strategy is working:

typescript// Prometheus metrics for rate limiting
import { Counter, Histogram, Gauge } from 'prom-client';

export const rateLimitMetrics = {
  requestsAllowed: new Counter({
    name: 'rate_limit_requests_allowed_total',
    help: 'Total requests allowed by rate limiter',
    labelNames: ['endpoint', 'user_id'],
  }),

  requestsDenied: new Counter({
    name: 'rate_limit_requests_denied_total',
    help: 'Total requests denied by rate limiter',
    labelNames: ['endpoint', 'user_id'],
  }),

  requestDuration: new Histogram({
    name: 'rate_limit_check_duration_seconds',
    help: 'Time spent checking rate limit',
    buckets: [0.001, 0.005, 0.01, 0.05, 0.1],
  }),

  remainingTokens: new Gauge({
    name: 'rate_limit_remaining_tokens',
    help: 'Remaining tokens for rate limiter',
    labelNames: ['user_id'],
  }),
};

// Usage in rate limiter
if (await limiter.tryRequest(userId)) {
  rateLimitMetrics.requestsAllowed.labels({ endpoint, user_id: userId }).inc();
} else {
  rateLimitMetrics.requestsDenied.labels({ endpoint, user_id: userId }).inc();
}

Conclusion

Rate limiting is not optional for public APIs—it's a fundamental security and stability mechanism. The right algorithm depends on your use case:

  • Token bucket: APIs needing burst allowance
  • Leaky bucket: APIs requiring consistent output rates
  • Sliding window: APIs needing accurate per-window limits
  • Fixed window: Simple scenarios with forgiving accuracy requirements

Implement rate limiting at multiple layers (edge, application, per-user), use distributed storage for consistency, and provide clear rate limit headers to clients. Monitor your rate limiting metrics continuously to ensure legitimate users aren't being blocked while attackers are successfully thwarted.


Your APIs are experiencing abuse or stability issues due to uncontrolled traffic? Talk to Imperialis engineering specialists to design and implement a comprehensive rate limiting strategy that protects your infrastructure while maintaining a great user experience.

Sources

Related reading