Knowledge

Rate Limiting in Production: Algorithms, Trade-offs, and Implementation

Protecting your APIs from abuse and overload while balancing user experience with system stability.

3/12/2026•8 min read•Knowledge

Rate Limiting in Production: Algorithms, Trade-offs, and Implementation

Executive summary

Protecting your APIs from abuse and overload while balancing user experience with system stability.

Last updated: 3/12/2026

Sources

The necessity of rate limiting

Every public API will be abused. This is not speculation—it is the operational reality of exposing services on the internet. Scrapers, automated bots, malicious actors, and even well-intentioned users will send more requests than your infrastructure can handle.

Rate limiting serves three critical purposes:

Protection against abuse: Prevent brute-force attacks, credential stuffing, and API scraping
Infrastructure stability: Ensure services don't collapse under unexpected load spikes
Fairness: Distribute capacity equitably across legitimate users

The challenge is implementing rate limiting that protects your systems without blocking legitimate users or creating poor user experience. The right algorithm, storage strategy, and configuration matter significantly.

Rate limiting algorithms

Different algorithms solve different problems. Understanding their trade-offs is essential for choosing the right approach.

Token Bucket Algorithm

The token bucket algorithm allows bursts of requests up to a maximum capacity while enforcing a long-term average rate.

How it works:

Tokens are added to a bucket at a fixed rate
Each request consumes one token
If the bucket is empty, requests are rejected
The bucket has a maximum capacity (burst allowance)

typescriptclass TokenBucketRateLimiter {
  constructor(
    private capacity: number,        // Maximum tokens
    private refillRate: number,     // Tokens per second
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  private tokens: number;
  private lastRefill: number;

  async tryRequest(): Promise<boolean> {
    this.refill();

    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }

    return false;
  }

  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;

    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
    this.lastRefill = now;
  }

  getRemaining(): number {
    this.refill();
    return this.tokens;
  }
}

// Usage: 100 requests per second, burst of 200
const rateLimiter = new TokenBucketRateLimiter(200, 100);

When to use: APIs that need to allow burst traffic (e.g., file uploads, bulk operations) while maintaining overall rate limits.

Advantages:

Allows legitimate bursts
Smooth throttling behavior
Memory-efficient

Disadvantages:

Tokens can accumulate and enable large bursts if underutilized

Leaky Bucket Algorithm

The leaky bucket processes requests at a constant rate, queueing excess requests until the queue overflows.

How it works:

Requests are added to a queue
Requests leave the queue at a fixed rate
If the queue is full, requests are rejected

typescriptclass LeakyBucketRateLimiter {
  constructor(
    private queueSize: number,       // Maximum queue size
    private leakRate: number,        // Requests per second
  ) {
    this.queue = [];
    this.lastLeak = Date.now();
  }

  private queue: any[];
  private lastLeak: number;

  async tryRequest(): Promise<boolean> {
    this.leak();

    if (this.queue.length < this.queueSize) {
      this.queue.push(Date.now());
      return true;
    }

    return false;
  }

  private leak() {
    const now = Date.now();
    const elapsed = (now - this.lastLeak) / 1000;
    const requestsToLeak = Math.floor(elapsed * this.leakRate);

    this.queue.splice(0, requestsToLeak);
    this.lastLeak = now;
  }
}

// Usage: Process 50 requests per second, queue up to 100
const rateLimiter = new LeakyBucketRateLimiter(100, 50);

When to use: APIs that need strict, consistent output rates (e.g., message queues, streaming data).

Advantages:

Consistent output rate
Handles traffic smoothing well

Disadvantages:

No burst allowance
Queue management complexity

Sliding Window Log Algorithm

The sliding window log tracks every request within a time window, providing accurate rate limiting.

How it works:

Store timestamp of each request
Count requests within the sliding time window
Reject if count exceeds threshold

typescriptclass SlidingWindowLogRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {
    this.requests = [];
  }

  private requests: number[];

  async tryRequest(): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - this.windowMs;

    // Remove requests outside the window
    this.requests = this.requests.filter(t => t > windowStart);

    if (this.requests.length < this.maxRequests) {
      this.requests.push(now);
      return true;
    }

    return false;
  }

  getRetryAfter(): number {
    const oldestRequest = this.requests[0];
    const windowStart = Date.now() - this.windowMs;
    return Math.max(0, oldestRequest - windowStart);
  }
}

// Usage: Max 100 requests per 60 seconds
const rateLimiter = new SlidingWindowLogRateLimiter(100, 60000);

When to use: APIs requiring accurate per-window rate limiting with user-friendly retry-after headers.

Advantages:

Accurate rate limiting
Easy to calculate retry-after

Disadvantages:

Memory-intensive for high traffic
Performance overhead from log management

Fixed Window Counter Algorithm

The fixed window counter divides time into fixed intervals and resets at interval boundaries.

How it works:

Divide time into fixed windows (e.g., per minute)
Increment counter for each request
Reset counter at window boundary

typescriptclass FixedWindowRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {
    this.currentWindow = Math.floor(Date.now() / this.windowMs);
    this.requestCount = 0;
  }

  private currentWindow: number;
  private requestCount: number;

  async tryRequest(): Promise<boolean> {
    const now = Date.now();
    const windowNumber = Math.floor(now / this.windowMs);

    if (windowNumber !== this.currentWindow) {
      this.currentWindow = windowNumber;
      this.requestCount = 0;
    }

    if (this.requestCount < this.maxRequests) {
      this.requestCount++;
      return true;
    }

    return false;
  }

  getRetryAfter(): number {
    const now = Date.now();
    const currentWindow = Math.floor(now / this.windowMs);
    const nextWindow = currentWindow + 1;
    const nextWindowStart = nextWindow * this.windowMs;

    return Math.max(0, nextWindowStart - now);
  }
}

// Usage: Max 1000 requests per minute
const rateLimiter = new FixedWindowRateLimiter(1000, 60000);

When to use: Simple rate limiting scenarios where minor inaccuracies at window boundaries are acceptable.

Advantages:

Simple implementation
Minimal memory overhead

Disadvantages:

Spike at window boundaries (double rate possible)
Less accurate than sliding window

Algorithm comparison

Algorithm	Accuracy	Burst Support	Memory	Complexity	Best For
Token Bucket	High	Yes	Low	Medium	APIs needing burst allowance
Leaky Bucket	Medium	No	Medium	Medium	Consistent output rates
Sliding Window	Very High	Limited	High	High	Accurate per-window limits
Fixed Window	Low	No	Low	Low	Simple rate limiting

Distributed rate limiting

In microservices architectures, rate limiting must work across multiple instances. This introduces two challenges:

Storage backend

Rate limiting state must be stored in a shared, distributed backend:

typescript// Redis-based sliding window implementation
import { createClient } from 'redis';

const redis = createClient();

class DistributedSlidingWindowRateLimiter {
  constructor(
    private maxRequests: number,
    private windowMs: number,
  ) {}

  async tryRequest(key: string): Promise<boolean> {
    const now = Date.now();
    const windowStart = now - this.windowMs;
    const pipeline = redis.multi();

    // Remove old requests
    pipeline.zRemRangeByScore(key, 0, windowStart);

    // Count current requests
    pipeline.zCard(key);

    // Add new request
    pipeline.zAdd(key, { score: now, value: now.toString() });

    // Set expiration
    pipeline.expire(key, Math.ceil(this.windowMs / 1000));

    const results = await pipeline.exec();
    const count = results[1] as number;

    return count < this.maxRequests;
  }
}

Synchronization overhead

Distributed rate limiting introduces network latency on every request. Mitigate this with:

Redis cluster: Use Redis Cluster for horizontal scaling
Local caching: Cache rate limit decisions locally for sub-second windows
Async updates: Update rate limit state asynchronously

Rate limiting strategies by use case

API endpoint rate limiting

typescript// Express middleware for endpoint rate limiting
import { Router } from 'express';

const router = Router();

// Different limits for different endpoints
const publicApiLimiter = new TokenBucketRateLimiter(100, 10);
const authApiLimiter = new TokenBucketRateLimiter(20, 2);
const premiumApiLimiter = new TokenBucketRateLimiter(1000, 100);

router.use('/api/public', async (req, res, next) => {
  const allowed = await publicApiLimiter.tryRequest();
  if (!allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  next();
});

router.use('/api/auth', async (req, res, next) => {
  const allowed = await authApiLimiter.tryRequest();
  if (!allowed) {
    const retryAfter = authApiLimiter.getRetryAfter();
    res.setHeader('Retry-After', Math.ceil(retryAfter / 1000));
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }
  next();
});

User-based rate limiting

typescript// Rate limit by user ID or API key
class UserRateLimiter {
  private limiters: Map<string, RateLimiter> = new Map();

  async tryRequest(userId: string): Promise<boolean> {
    if (!this.limiters.has(userId)) {
      this.limiters.set(userId, new SlidingWindowLogRateLimiter(100, 60000));
    }

    const limiter = this.limiters.get(userId)!;
    return await limiter.tryRequest();
  }
}

IP-based rate limiting

typescript// Rate limit by IP address (use with caution)
import ip from 'ip';

class IpRateLimiter {
  private limiters: Map<string, RateLimiter> = new Map();

  async tryRequest(reqIp: string): Promise<boolean> {
    // Normalize IP addresses (handle IPv6 vs IPv4)
    const normalizedIp = ip.normalize(reqIp);

    if (!this.limiters.has(normalizedIp)) {
      this.limiters.set(normalizedIp, new SlidingWindowLogRateLimiter(50, 60000));
    }

    const limiter = this.limiters.get(normalizedIp)!;
    return await limiter.tryRequest();
  }
}

Caution: IP-based rate limiting has issues with NAT, proxies, and shared networks. Use as a defense-in-depth measure, not as the primary strategy.

Response headers and user experience

Rate limiting should be transparent to clients through proper HTTP headers:

typescriptfunction setRateLimitHeaders(res: Response, limiter: RateLimiter) {
  const remaining = limiter.getRemaining();
  const limit = limiter.getLimit();
  const reset = limiter.getResetTime();

  res.setHeader('X-RateLimit-Limit', limit);
  res.setHeader('X-RateLimit-Remaining', Math.max(0, remaining));
  res.setHeader('X-RateLimit-Reset', reset);

  if (remaining <= 0) {
    const retryAfter = limiter.getRetryAfter();
    res.setHeader('Retry-After', Math.ceil(retryAfter / 1000));
  }
}

Standard headers:

X-RateLimit-Limit: Maximum requests per window
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when window resets
Retry-After: Seconds until retry (when limit exceeded)

Rate limiting tiers

Implement tiered rate limiting based on user subscription or API key:

typescriptinterface RateLimitTier {
  requestsPerMinute: number;
  burstCapacity: number;
}

const rateLimitTiers: Record<string, RateLimitTier> = {
  free: { requestsPerMinute: 10, burstCapacity: 20 },
  basic: { requestsPerMinute: 100, burstCapacity: 200 },
  pro: { requestsPerMinute: 1000, burstCapacity: 2000 },
  enterprise: { requestsPerMinute: 10000, burstCapacity: 20000 },
};

class TieredRateLimiter {
  private limiters: Map<string, TokenBucketRateLimiter> = new Map();

  async tryRequest(userId: string, tier: string): Promise<boolean> {
    if (!this.limiters.has(userId)) {
      const config = rateLimitTiers[tier] || rateLimitTiers.free;
      this.limiters.set(userId, new TokenBucketRateLimiter(
        config.burstCapacity,
        config.requestsPerMinute / 60
      ));
    }

    const limiter = this.limiters.get(userId)!;
    return await limiter.tryRequest();
  }
}

Common anti-patterns

Anti-pattern 1: Rate limiting only at the edge

Rate limiting at the CDN or API Gateway level is insufficient for protection against internal abuse or compromised accounts. Implement application-level rate limiting as well.

Anti-pattern 2: Blocking legitimate burst traffic

Using fixed window algorithms for APIs that legitimately experience burst traffic (e.g., after a user completes a form submission) creates poor user experience. Use token bucket or sliding window algorithms instead.

Anti-pattern 3: Ignoring rate limit headers

Clients that ignore rate limit headers and retry immediately create thundering herd effects. Implement exponential backoff with jitter on the client side.

Anti-pattern 4: Rate limiting without monitoring

Rate limiting without comprehensive monitoring means you won't know if legitimate users are being blocked or if attackers are finding ways around your limits.

Monitoring and observability

Track rate limiting metrics to ensure your strategy is working:

typescript// Prometheus metrics for rate limiting
import { Counter, Histogram, Gauge } from 'prom-client';

export const rateLimitMetrics = {
  requestsAllowed: new Counter({
    name: 'rate_limit_requests_allowed_total',
    help: 'Total requests allowed by rate limiter',
    labelNames: ['endpoint', 'user_id'],
  }),

  requestsDenied: new Counter({
    name: 'rate_limit_requests_denied_total',
    help: 'Total requests denied by rate limiter',
    labelNames: ['endpoint', 'user_id'],
  }),

  requestDuration: new Histogram({
    name: 'rate_limit_check_duration_seconds',
    help: 'Time spent checking rate limit',
    buckets: [0.001, 0.005, 0.01, 0.05, 0.1],
  }),

  remainingTokens: new Gauge({
    name: 'rate_limit_remaining_tokens',
    help: 'Remaining tokens for rate limiter',
    labelNames: ['user_id'],
  }),
};

// Usage in rate limiter
if (await limiter.tryRequest(userId)) {
  rateLimitMetrics.requestsAllowed.labels({ endpoint, user_id: userId }).inc();
} else {
  rateLimitMetrics.requestsDenied.labels({ endpoint, user_id: userId }).inc();
}

Conclusion

Rate limiting is not optional for public APIs—it's a fundamental security and stability mechanism. The right algorithm depends on your use case:

Token bucket: APIs needing burst allowance
Leaky bucket: APIs requiring consistent output rates
Sliding window: APIs needing accurate per-window limits
Fixed window: Simple scenarios with forgiving accuracy requirements

Implement rate limiting at multiple layers (edge, application, per-user), use distributed storage for consistency, and provide clear rate limit headers to clients. Monitor your rate limiting metrics continuously to ensure legitimate users aren't being blocked while attackers are successfully thwarted.

Your APIs are experiencing abuse or stability issues due to uncontrolled traffic? Talk to Imperialis engineering specialists to design and implement a comprehensive rate limiting strategy that protects your infrastructure while maintaining a great user experience.

Sources

Rate Limiting Algorithms - Cloudflare — algorithm comparison
Redis Rate Limiting - Redis.io — distributed implementation
API Rate Limiting Best Practices - AWS — production patterns

Talk about API architecture Explore more articles