Developer tools

Rate Limiting Strategies for Production APIs: Token Bucket, Leaky Bucket, and Beyond

Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and managing capacity. Understanding different algorithms and their trade-offs prevents service degradation and unexpected outages.

3/13/20268 min readDev tools
Rate Limiting Strategies for Production APIs: Token Bucket, Leaky Bucket, and Beyond

Executive summary

Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and managing capacity. Understanding different algorithms and their trade-offs prevents service degradation and unexpected outages.

Last updated: 3/13/2026

Introduction: Why rate limiting matters

Every production API faces the fundamental tension between capacity and demand. Without rate limiting, abusive traffic can overwhelm systems, legitimate users experience degradation, and infrastructure costs balloon uncontrollably. Rate limiting provides the guardrails that keep systems stable and fair.

Rate limiting is more than preventing abuse—it's about capacity management. It ensures predictable system behavior under load, enables fair resource allocation, and provides the foundation for graceful degradation when limits are exceeded.

Choosing the right rate limiting strategy depends on your requirements: burst tolerance, strictness, precision, and operational complexity. Understanding trade-offs prevents implementing an algorithm that works in development but fails under production load.

Rate limiting algorithms

Token bucket algorithm

The token bucket algorithm allows bursts while enforcing long-term rate limits. Tokens accumulate in a bucket at a fixed rate. Each request consumes one or more tokens. If the bucket is empty, requests are rejected.

typescriptclass TokenBucket {
  private tokens: number;
  private lastRefillTimestamp: number;
  private capacity: number;
  private refillRate: number; // tokens per second

  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
    this.tokens = capacity;
    this.lastRefillTimestamp = Date.now();
  }

  consume(tokens: number = 1): boolean {
    this.refill();

    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;
    }

    return false;
  }

  // Calculate wait time before next request
  getWaitTime(tokens: number = 1): number {
    this.refill();

    if (this.tokens >= tokens) {
      return 0;
    }

    const needed = tokens - this.tokens;
    return (needed / this.refillRate) * 1000; // milliseconds
  }

  private refill(): void {
    const now = Date.now();
    const timeSinceLastRefill = (now - this.lastRefillTimestamp) / 1000; // seconds

    const tokensToAdd = timeSinceLastRefill * this.refillRate;
    this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);

    this.lastRefillTimestamp = now;
  }
}

// Usage: Allow bursts of 10 requests, refill at 1 request/second
const rateLimiter = new TokenBucket(10, 1);

async function handleRequest(request: Request): Promise<Response> {
  if (!rateLimiter.consume()) {
    const waitTime = rateLimiter.getWaitTime();
    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'Retry-After': Math.ceil(waitTime / 1000).toString(),
        'X-RateLimit-Limit': '10',
        'X-RateLimit-Remaining': Math.floor(rateLimiter.tokens).toString(),
      }
    });
  }

  return await processRequest(request);
}

Characteristics:

  • Allows bursts up to bucket capacity
  • Long-term rate limit enforced by refill rate
  • Memory efficient (single counter)
  • Simple to implement and reason about

Best for:

  • APIs that tolerate burst traffic
  • User-facing applications where burstiness is expected
  • Scenarios requiring simple implementation

Disadvantages:

  • Bursts can be exhausted quickly by aggressive clients
  • No protection against distributed attacks across multiple IPs

Leaky bucket algorithm

The leaky bucket smooths traffic by processing requests at a constant rate, regardless of input rate. Requests that exceed capacity are queued or rejected.

typescriptclass LeakyBucket {
  private queue: Array<{request: Request, resolve: Function}> = [];
  private lastLeakTimestamp: number;
  private capacity: number;
  private leakRate: number; // requests per second

  constructor(capacity: number, leakRate: number) {
    this.capacity = capacity;
    this.leakRate = leakRate;
    this.lastLeakTimestamp = Date.now();
  }

  async process(request: Request): Promise<Response> {
    this.leak();

    if (this.queue.length >= this.capacity) {
      return new Response('Too Many Requests', {
        status: 429,
        headers: {
          'X-RateLimit-Limit': this.capacity.toString(),
          'X-RateLimit-QueueSize': this.queue.length.toString(),
        }
      });
    }

    return new Promise((resolve) => {
      this.queue.push({ request, resolve });
      this.processQueue();
    });
  }

  private leak(): void {
    const now = Date.now();
    const timeSinceLastLeak = (now - this.lastLeakTimestamp) / 1000; // seconds

    const requestsToProcess = Math.floor(timeSinceLastLeak * this.leakRate);

    for (let i = 0; i < requestsToProcess && this.queue.length > 0; i++) {
      const { request, resolve } = this.queue.shift()!;
      resolve(this.executeRequest(request));
    }

    this.lastLeakTimestamp = now;
  }

  private processQueue(): void {
    this.leak();
  }

  private async executeRequest(request: Request): Promise<Response> {
    // Execute the actual request
    return await processRequest(request);
  }
}

// Usage: Process up to 100 requests at 10 requests/second
const rateLimiter = new LeakyBucket(100, 10);

Characteristics:

  • Smooths traffic to constant rate
  • No burst tolerance
  • Memory requirements grow with queue size
  • More complex implementation than token bucket

Best for:

  • APIs requiring predictable throughput
  • Background job processing
  • Scenarios where burst traffic should be smoothed

Disadvantages:

  • Queueing adds latency
  • Larger memory footprint
  • More complex to implement correctly

Fixed window counter

The fixed window algorithm tracks request counts within fixed time windows. When the counter exceeds the limit, requests are rejected until the next window starts.

typescriptclass FixedWindowCounter {
  private counters: Map<string, {count: number, windowStart: number}> = new Map();
  private windowSize: number; // milliseconds
  private limit: number;

  constructor(windowSize: number, limit: number) {
    this.windowSize = windowSize;
    this.limit = limit;
  }

  allow(key: string): boolean {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const counter = this.counters.get(key);

    // Reset counter if window has expired
    if (!counter || counter.windowStart !== windowStart) {
      this.counters.set(key, { count: 1, windowStart });
      return true;
    }

    // Check limit
    if (counter.count >= this.limit) {
      return false;
    }

    // Increment counter
    counter.count++;
    return true;
  }

  // Get remaining requests in current window
  getRemaining(key: string): number {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const counter = this.counters.get(key);

    if (!counter || counter.windowStart !== windowStart) {
      return this.limit;
    }

    return Math.max(0, this.limit - counter.count);
  }

  // Get time until next window
  getResetTime(key: string): number {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const windowEnd = windowStart + this.windowSize;

    return windowEnd - now;
  }
}

// Usage: Allow 100 requests per minute
const rateLimiter = new FixedWindowCounter(60000, 100);

async function handleRequest(request: Request): Promise<Response> {
  const clientKey = getClientKey(request);

  if (!rateLimiter.allow(clientKey)) {
    const resetTime = rateLimiter.getResetTime(clientKey);
    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'Retry-After': Math.ceil(resetTime / 1000).toString(),
        'X-RateLimit-Limit': '100',
        'X-RateLimit-Remaining': rateLimiter.getRemaining(clientKey).toString(),
        'X-RateLimit-Reset': new Date(Date.now() + resetTime).toUTCString(),
      }
    });
  }

  return await processRequest(request);
}

function getClientKey(request: Request): string {
  // Extract client identifier (IP, user ID, API key, etc.)
  return request.headers.get('X-Forwarded-For') || request.headers.get('X-Real-IP') || 'unknown';
}

Characteristics:

  • Simple implementation
  • No burst tolerance within window
  • Boundary issues (spikes at window boundaries)
  • Easy to understand and configure

Best for:

  • Simple rate limiting requirements
  • Scenarios where boundary spikes are acceptable
  • Quick implementations with minimal complexity

Disadvantages:

  • Spike at window boundaries (double burst)
  • No smoothing of traffic
  • Less precise than sliding window

Sliding window log

The sliding window algorithm tracks individual request timestamps within a sliding time window, providing more precise limiting without boundary spikes.

typescriptclass SlidingWindowLog {
  private logs: Map<string, number[]> = new Map();
  private windowSize: number; // milliseconds
  private limit: number;

  constructor(windowSize: number, limit: number) {
    this.windowSize = windowSize;
    this.limit = limit;
  }

  allow(key: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSize;

    // Get or initialize log for key
    const timestamps = this.logs.get(key) || [];
    this.logs.set(key, timestamps);

    // Remove timestamps outside the window
    const validTimestamps = timestamps.filter(t => t > windowStart);

    // Check limit
    if (validTimestamps.length >= this.limit) {
      this.logs.set(key, validTimestamps); // Update with filtered timestamps
      return false;
    }

    // Add current request timestamp
    validTimestamps.push(now);
    this.logs.set(key, validTimestamps);

    return true;
  }

  // Get remaining requests in window
  getRemaining(key: string): number {
    const now = Date.now();
    const windowStart = now - this.windowSize;
    const timestamps = this.logs.get(key) || [];

    const validTimestamps = timestamps.filter(t => t > windowStart);
    this.logs.set(key, validTimestamps);

    return Math.max(0, this.limit - validTimestamps.length);
  }

  // Get oldest timestamp in window (for retry-after calculation)
  getOldestTimestamp(key: string): number | null {
    const now = Date.now();
    const windowStart = now - this.windowSize;
    const timestamps = this.logs.get(key) || [];

    const validTimestamps = timestamps.filter(t => t > windowStart);
    if (validTimestamps.length === 0) {
      return null;
    }

    return validTimestamps[0];
  }
}

// Usage: Allow 100 requests per minute with precise window
const rateLimiter = new SlidingWindowLog(60000, 100);

async function handleRequest(request: Request): Promise<Response> {
  const clientKey = getClientKey(request);

  if (!rateLimiter.allow(clientKey)) {
    const oldestTimestamp = rateLimiter.getOldestTimestamp(clientKey);
    const waitTime = oldestTimestamp
      ? oldestTimestamp + rateLimiter['windowSize'] - Date.now()
      : rateLimiter['windowSize'];

    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'Retry-After': Math.ceil(waitTime / 1000).toString(),
        'X-RateLimit-Limit': '100',
        'X-RateLimit-Remaining': rateLimiter.getRemaining(clientKey).toString(),
      }
    });
  }

  return await processRequest(request);
}

Characteristics:

  • Precise limiting without boundary spikes
  • Memory efficient (stores only timestamps)
  • Smooth traffic distribution
  • More complex implementation than fixed window

Best for:

  • APIs requiring precise rate limiting
  • Scenarios where boundary spikes are unacceptable
  • Production systems with strict capacity requirements

Disadvantages:

  • Higher memory usage than counter-based approaches
  • More complex to implement
  • Requires cleanup of old timestamps

Distributed rate limiting

Redis-based rate limiting

For distributed systems, rate limiting state must be shared across instances. Redis provides a fast, shared data store for distributed rate limiting.

typescriptclass RedisRateLimiter {
  private redis: RedisClient;
  private windowSize: number;
  private limit: number;

  constructor(redis: RedisClient, windowSize: number, limit: number) {
    this.redis = redis;
    this.windowSize = windowSize;
    this.limit = limit;
  }

  async allow(key: string): Promise<{allowed: boolean, remaining: number, resetTime: number}> {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const redisKey = `ratelimit:${key}:${windowStart}`;

    // Get current count
    const currentCount = await this.redis.get(redisKey);
    const count = parseInt(currentCount || '0', 10);

    // Check limit
    if (count >= this.limit) {
      const windowEnd = windowStart + this.windowSize;
      return {
        allowed: false,
        remaining: 0,
        resetTime: windowEnd
      };
    }

    // Increment counter
    const newCount = count + 1;
    await this.redis.incr(redisKey);

    // Set expiration
    await this.redis.expireat(redisKey, windowStart + this.windowSize / 1000);

    return {
      allowed: true,
      remaining: this.limit - newCount,
      resetTime: windowStart + this.windowSize
    };
  }
}

// Usage with Express.js
const express = require('express');
const Redis = require('ioredis');
const app = express();

const redis = new Redis();
const rateLimiter = new RedisRateLimiter(redis, 60000, 100);

app.use(async (req, res, next) => {
  const clientKey = getClientKey(req);
  const result = await rateLimiter.allow(clientKey);

  res.set('X-RateLimit-Limit', '100');
  res.set('X-RateLimit-Remaining', result.remaining.toString());
  res.set('X-RateLimit-Reset', new Date(result.resetTime).toUTCString());

  if (!result.allowed) {
    const retryAfter = Math.ceil((result.resetTime - Date.now()) / 1000);
    res.set('Retry-After', retryAfter.toString());
    return res.status(429).send('Too Many Requests');
  }

  next();
});

Redis Lua script for atomic operations

For more complex rate limiting logic, use Redis Lua scripts to ensure atomicity:

lua-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_to_consume = tonumber(ARGV[3])
local now = tonumber(ARGV[4])

-- Get current state
local state = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(state[1]) or capacity
local last_refill = tonumber(state[2]) or now

-- Refill tokens
local time_passed = (now - last_refill) / 1000
local tokens_to_add = time_passed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)

-- Check if enough tokens
if tokens < tokens_to_consume then
  -- Not enough tokens, return rejection
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, 3600) -- 1 hour TTL
  return {0, tokens}
end

-- Consume tokens
tokens = tokens - tokens_to_consume
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600) -- 1 hour TTL

return {1, tokens}
typescript// TypeScript usage
async function consumeTokens(
  redis: RedisClient,
  key: string,
  capacity: number,
  refillRate: number,
  tokensToConsume: number
): Promise<{allowed: boolean, remainingTokens: number}> {
  const script = `
    -- token_bucket.lua
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local tokens_to_consume = tonumber(ARGV[3])
    local now = tonumber(ARGV[4])

    -- Get current state
    local state = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(state[1]) or capacity
    local last_refill = tonumber(state[2]) or now

    -- Refill tokens
    local time_passed = (now - last_refill) / 1000
    local tokens_to_add = time_passed * refill_rate
    tokens = math.min(capacity, tokens + tokens_to_add)

    -- Check if enough tokens
    if tokens < tokens_to_consume then
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 3600)
      return {0, tokens}
    end

    -- Consume tokens
    tokens = tokens - tokens_to_consume
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)

    return {1, tokens}
  `;

  const result = await redis.eval(
    script,
    1, // number of keys
    key, // key
    capacity.toString(), // ARGV[1]
    refillRate.toString(), // ARGV[2]
    tokensToConsume.toString(), // ARGV[3]
    Date.now().toString() // ARGV[4]
  );

  return {
    allowed: result[0] === 1,
    remainingTokens: result[1]
  };
}

Rate limiting strategies

Multiple tiers

Implement different rate limits for different user tiers:

typescriptclass TieredRateLimiter {
  private limiters: Map<string, RateLimiter> = new Map();
  private userTierCache: Map<string, string> = new Map();
  private tierLimits: Map<string, {windowSize: number, limit: number}> = new Map();

  constructor() {
    this.tierLimits.set('free', { windowSize: 60000, limit: 100 });
    this.tierLimits.set('pro', { windowSize: 60000, limit: 1000 });
    this.tierLimits.set('enterprise', { windowSize: 60000, limit: 10000 });

    // Initialize limiters for each tier
    this.tierLimits.forEach((config, tier) => {
      this.limiters.set(tier, new FixedWindowCounter(config.windowSize, config.limit));
    });
  }

  async allow(userId: string): Promise<{allowed: boolean, tier: string}> {
    const tier = await this.getUserTier(userId);
    const limiter = this.limiters.get(tier)!;

    return {
      allowed: limiter.allow(userId),
      tier
    };
  }

  private async getUserTier(userId: string): Promise<string> {
    // Check cache first
    if (this.userTierCache.has(userId)) {
      return this.userTierCache.get(userId)!;
    }

    // Fetch from database
    const user = await fetchUserFromDatabase(userId);
    const tier = user.subscriptionTier || 'free';

    // Cache for 5 minutes
    this.userTierCache.set(userId, tier);
    setTimeout(() => this.userTierCache.delete(userId), 300000);

    return tier;
  }
}

Per-endpoint rate limiting

Different endpoints may have different rate limits:

typescriptclass EndpointRateLimiter {
  private limiters: Map<string, Map<string, RateLimiter>> = new Map();

  registerEndpoint(path: string, method: string, windowSize: number, limit: number): void {
    if (!this.limiters.has(method)) {
      this.limiters.set(method, new Map());
    }

    this.limiters.get(method)!.set(path, new FixedWindowCounter(windowSize, limit));
  }

  async allow(method: string, path: string, clientId: string): Promise<boolean> {
    const methodLimiters = this.limiters.get(method);
    if (!methodLimiters) {
      return true; // No rate limit configured
    }

    const limiter = methodLimiters.get(path);
    if (!limiter) {
      return true; // No rate limit configured
    }

    return limiter.allow(`${method}:${path}:${clientId}`);
  }
}

// Usage
const rateLimiter = new EndpointRateLimiter();

// Different limits for different endpoints
rateLimiter.registerEndpoint('/api/v1/users', 'GET', 60000, 100);
rateLimiter.registerEndpoint('/api/v1/users', 'POST', 60000, 10);
rateLimiter.registerEndpoint('/api/v1/search', 'GET', 60000, 1000);

// Middleware
app.use(async (req, res, next) => {
  const clientId = getClientKey(req);
  const allowed = await rateLimiter.allow(req.method, req.path, clientId);

  if (!allowed) {
    return res.status(429).send('Too Many Requests');
  }

  next();
});

Circuit breaker integration

Combine rate limiting with circuit breakers for resilience:

typescriptclass CircuitBreakerRateLimiter {
  private rateLimiter: RateLimiter;
  private circuitBreaker: CircuitBreaker;

  async execute(request: Request): Promise<Response> {
    // Check rate limit first
    const clientId = getClientKey(request);
    if (!this.rateLimiter.allow(clientId)) {
      return new Response('Too Many Requests', { status: 429 });
    }

    // Check circuit breaker
    if (this.circuitBreaker.isOpen()) {
      return new Response('Service Unavailable', { status: 503 });
    }

    try {
      // Execute request
      const response = await this.executeRequest(request);
      this.circuitBreaker.recordSuccess();
      return response;
    } catch (error) {
      this.circuitBreaker.recordFailure();
      throw error;
    }
  }
}

class CircuitBreaker {
  private failureCount = 0;
  private lastFailureTime = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(
    private failureThreshold: number = 5,
    private timeout: number = 60000 // 1 minute
  ) {}

  isOpen(): boolean {
    if (this.state === 'open') {
      // Check if we should transition to half-open
      if (Date.now() - this.lastFailureTime > this.timeout) {
        this.state = 'half-open';
        return false;
      }
      return true;
    }
    return false;
  }

  recordSuccess(): void {
    this.failureCount = 0;
    this.state = 'closed';
  }

  recordFailure(): void {
    this.failureCount++;
    this.lastFailureTime = Date.now();

    if (this.failureCount >= this.failureThreshold) {
      this.state = 'open';
    }
  }
}

Response headers and client experience

Rate limiting responses should provide clear guidance to clients:

typescriptasync function handleRateLimitedRequest(
  request: Request,
  rateLimiter: RateLimiter,
  clientId: string
): Promise<Response> {
  if (!rateLimiter.allow(clientId)) {
    const remaining = rateLimiter.getRemaining(clientId);
    const resetTime = rateLimiter.getResetTime(clientId);

    return new Response('Too Many Requests', {
      status: 429,
      headers: {
        'Content-Type': 'application/json',
        'Retry-After': Math.ceil(resetTime / 1000).toString(),
        'X-RateLimit-Limit': rateLimiter['limit'].toString(),
        'X-RateLimit-Remaining': remaining.toString(),
        'X-RateLimit-Reset': new Date(Date.now() + resetTime).toUTCString(),
      }
    });
  }

  return await processRequest(request);
}

// Include rate limit info in successful responses
function addRateLimitHeaders(
  response: Response,
  rateLimiter: RateLimiter,
  clientId: string
): Response {
  const headers = new Headers(response.headers);

  headers.set('X-RateLimit-Limit', rateLimiter['limit'].toString());
  headers.set('X-RateLimit-Remaining', rateLimiter.getRemaining(clientId).toString());

  return new Response(response.body, {
    status: response.status,
    headers
  });
}

Monitoring and observability

Track rate limiting metrics to adjust limits and detect abuse:

typescriptclass RateLimitingMetrics {
  private requestCounts: Map<string, number> = new Map();
  private rejectionCounts: Map<string, number> = new Map();
  private requestTimes: Array<{key: string, timestamp: number, duration: number}> = [];

  recordRequest(key: string, allowed: boolean, duration: number): void {
    // Count requests
    const requestCount = this.requestCounts.get(key) || 0;
    this.requestCounts.set(key, requestCount + 1);

    // Count rejections
    if (!allowed) {
      const rejectionCount = this.rejectionCounts.get(key) || 0;
      this.rejectionCounts.set(key, rejectionCount + 1);
    }

    // Record timing
    this.requestTimes.push({ key, timestamp: Date.now(), duration });

    // Cleanup old records (keep last 1000)
    if (this.requestTimes.length > 1000) {
      this.requestTimes.shift();
    }
  }

  getMetrics(key: string): RateLimitMetrics {
    const requestCount = this.requestCounts.get(key) || 0;
    const rejectionCount = this.rejectionCounts.get(key) || 0;

    const keyRequests = this.requestTimes.filter(r => r.key === key);
    const avgDuration = keyRequests.length > 0
      ? keyRequests.reduce((sum, r) => sum + r.duration, 0) / keyRequests.length
      : 0;

    return {
      key,
      requestCount,
      rejectionCount,
      rejectionRate: requestCount > 0 ? rejectionCount / requestCount : 0,
      averageRequestDuration: avgDuration,
    };
  }

  getTopOffenders(limit: number = 10): RateLimitMetrics[] {
    const allKeys = Array.from(new Set(this.requestTimes.map(r => r.key)));

    return allKeys
      .map(key => this.getMetrics(key))
      .sort((a, b) => b.rejectionCount - a.rejectionCount)
      .slice(0, limit);
  }
}

interface RateLimitMetrics {
  key: string;
  requestCount: number;
  rejectionCount: number;
  rejectionRate: number;
  averageRequestDuration: number;
}

Decision framework

Choose the right algorithm

AlgorithmBurst TolerancePrecisionComplexityMemoryBest For
Token BucketHighMediumLowLowAPIs tolerating bursts
Leaky BucketNoneHighMediumHighPredictable throughput
Fixed WindowLowLowVery LowVery LowSimple implementations
Sliding WindowLowHighHighMediumPrecise limiting

Evaluate requirements

Questions to ask:

  1. Does your API need to handle burst traffic?
  • Yes → Token bucket or leaky bucket
  • No → Fixed or sliding window
  1. How precise does your rate limiting need to be?
  • Very precise → Sliding window
  • Moderate precision → Token bucket
  • Basic precision → Fixed window
  1. What is your operational complexity tolerance?
  • Low tolerance → Fixed window
  • Moderate tolerance → Token bucket
  • High tolerance → Sliding window or leaky bucket
  1. Do you need distributed rate limiting?
  • Yes → Redis or shared database
  • No → In-memory implementation

Conclusion

Rate limiting is essential for protecting production APIs and ensuring fair resource allocation. The right algorithm depends on your specific requirements: burst tolerance, precision needs, and operational complexity.

Start with a simple implementation (fixed window or token bucket) and evolve as your requirements become clearer. Monitor rate limiting metrics continuously to adjust limits and detect patterns of abuse. The goal isn't to block legitimate users—it's to create predictable system behavior that protects both the infrastructure and user experience.

Practical closing question: What is the most common abuse pattern in your current API, and would a different rate limiting algorithm better address it?


Building a production API and need expert guidance on rate limiting and capacity management? Talk to Imperialis API specialists about implementing rate limiting strategies that protect your infrastructure while providing excellent user experience.

Sources

Related reading