Rate Limiting Strategies for Production APIs: Token Bucket, Leaky Bucket, and Beyond
Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and managing capacity. Understanding different algorithms and their trade-offs prevents service degradation and unexpected outages.
Executive summary
Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and managing capacity. Understanding different algorithms and their trade-offs prevents service degradation and unexpected outages.
Last updated: 3/13/2026
Introduction: Why rate limiting matters
Every production API faces the fundamental tension between capacity and demand. Without rate limiting, abusive traffic can overwhelm systems, legitimate users experience degradation, and infrastructure costs balloon uncontrollably. Rate limiting provides the guardrails that keep systems stable and fair.
Rate limiting is more than preventing abuse—it's about capacity management. It ensures predictable system behavior under load, enables fair resource allocation, and provides the foundation for graceful degradation when limits are exceeded.
Choosing the right rate limiting strategy depends on your requirements: burst tolerance, strictness, precision, and operational complexity. Understanding trade-offs prevents implementing an algorithm that works in development but fails under production load.
Rate limiting algorithms
Token bucket algorithm
The token bucket algorithm allows bursts while enforcing long-term rate limits. Tokens accumulate in a bucket at a fixed rate. Each request consumes one or more tokens. If the bucket is empty, requests are rejected.
typescriptclass TokenBucket {
private tokens: number;
private lastRefillTimestamp: number;
private capacity: number;
private refillRate: number; // tokens per second
constructor(capacity: number, refillRate: number) {
this.capacity = capacity;
this.refillRate = refillRate;
this.tokens = capacity;
this.lastRefillTimestamp = Date.now();
}
consume(tokens: number = 1): boolean {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true;
}
return false;
}
// Calculate wait time before next request
getWaitTime(tokens: number = 1): number {
this.refill();
if (this.tokens >= tokens) {
return 0;
}
const needed = tokens - this.tokens;
return (needed / this.refillRate) * 1000; // milliseconds
}
private refill(): void {
const now = Date.now();
const timeSinceLastRefill = (now - this.lastRefillTimestamp) / 1000; // seconds
const tokensToAdd = timeSinceLastRefill * this.refillRate;
this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
this.lastRefillTimestamp = now;
}
}
// Usage: Allow bursts of 10 requests, refill at 1 request/second
const rateLimiter = new TokenBucket(10, 1);
async function handleRequest(request: Request): Promise<Response> {
if (!rateLimiter.consume()) {
const waitTime = rateLimiter.getWaitTime();
return new Response('Too Many Requests', {
status: 429,
headers: {
'Retry-After': Math.ceil(waitTime / 1000).toString(),
'X-RateLimit-Limit': '10',
'X-RateLimit-Remaining': Math.floor(rateLimiter.tokens).toString(),
}
});
}
return await processRequest(request);
}Characteristics:
- Allows bursts up to bucket capacity
- Long-term rate limit enforced by refill rate
- Memory efficient (single counter)
- Simple to implement and reason about
Best for:
- APIs that tolerate burst traffic
- User-facing applications where burstiness is expected
- Scenarios requiring simple implementation
Disadvantages:
- Bursts can be exhausted quickly by aggressive clients
- No protection against distributed attacks across multiple IPs
Leaky bucket algorithm
The leaky bucket smooths traffic by processing requests at a constant rate, regardless of input rate. Requests that exceed capacity are queued or rejected.
typescriptclass LeakyBucket {
private queue: Array<{request: Request, resolve: Function}> = [];
private lastLeakTimestamp: number;
private capacity: number;
private leakRate: number; // requests per second
constructor(capacity: number, leakRate: number) {
this.capacity = capacity;
this.leakRate = leakRate;
this.lastLeakTimestamp = Date.now();
}
async process(request: Request): Promise<Response> {
this.leak();
if (this.queue.length >= this.capacity) {
return new Response('Too Many Requests', {
status: 429,
headers: {
'X-RateLimit-Limit': this.capacity.toString(),
'X-RateLimit-QueueSize': this.queue.length.toString(),
}
});
}
return new Promise((resolve) => {
this.queue.push({ request, resolve });
this.processQueue();
});
}
private leak(): void {
const now = Date.now();
const timeSinceLastLeak = (now - this.lastLeakTimestamp) / 1000; // seconds
const requestsToProcess = Math.floor(timeSinceLastLeak * this.leakRate);
for (let i = 0; i < requestsToProcess && this.queue.length > 0; i++) {
const { request, resolve } = this.queue.shift()!;
resolve(this.executeRequest(request));
}
this.lastLeakTimestamp = now;
}
private processQueue(): void {
this.leak();
}
private async executeRequest(request: Request): Promise<Response> {
// Execute the actual request
return await processRequest(request);
}
}
// Usage: Process up to 100 requests at 10 requests/second
const rateLimiter = new LeakyBucket(100, 10);Characteristics:
- Smooths traffic to constant rate
- No burst tolerance
- Memory requirements grow with queue size
- More complex implementation than token bucket
Best for:
- APIs requiring predictable throughput
- Background job processing
- Scenarios where burst traffic should be smoothed
Disadvantages:
- Queueing adds latency
- Larger memory footprint
- More complex to implement correctly
Fixed window counter
The fixed window algorithm tracks request counts within fixed time windows. When the counter exceeds the limit, requests are rejected until the next window starts.
typescriptclass FixedWindowCounter {
private counters: Map<string, {count: number, windowStart: number}> = new Map();
private windowSize: number; // milliseconds
private limit: number;
constructor(windowSize: number, limit: number) {
this.windowSize = windowSize;
this.limit = limit;
}
allow(key: string): boolean {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const counter = this.counters.get(key);
// Reset counter if window has expired
if (!counter || counter.windowStart !== windowStart) {
this.counters.set(key, { count: 1, windowStart });
return true;
}
// Check limit
if (counter.count >= this.limit) {
return false;
}
// Increment counter
counter.count++;
return true;
}
// Get remaining requests in current window
getRemaining(key: string): number {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const counter = this.counters.get(key);
if (!counter || counter.windowStart !== windowStart) {
return this.limit;
}
return Math.max(0, this.limit - counter.count);
}
// Get time until next window
getResetTime(key: string): number {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const windowEnd = windowStart + this.windowSize;
return windowEnd - now;
}
}
// Usage: Allow 100 requests per minute
const rateLimiter = new FixedWindowCounter(60000, 100);
async function handleRequest(request: Request): Promise<Response> {
const clientKey = getClientKey(request);
if (!rateLimiter.allow(clientKey)) {
const resetTime = rateLimiter.getResetTime(clientKey);
return new Response('Too Many Requests', {
status: 429,
headers: {
'Retry-After': Math.ceil(resetTime / 1000).toString(),
'X-RateLimit-Limit': '100',
'X-RateLimit-Remaining': rateLimiter.getRemaining(clientKey).toString(),
'X-RateLimit-Reset': new Date(Date.now() + resetTime).toUTCString(),
}
});
}
return await processRequest(request);
}
function getClientKey(request: Request): string {
// Extract client identifier (IP, user ID, API key, etc.)
return request.headers.get('X-Forwarded-For') || request.headers.get('X-Real-IP') || 'unknown';
}Characteristics:
- Simple implementation
- No burst tolerance within window
- Boundary issues (spikes at window boundaries)
- Easy to understand and configure
Best for:
- Simple rate limiting requirements
- Scenarios where boundary spikes are acceptable
- Quick implementations with minimal complexity
Disadvantages:
- Spike at window boundaries (double burst)
- No smoothing of traffic
- Less precise than sliding window
Sliding window log
The sliding window algorithm tracks individual request timestamps within a sliding time window, providing more precise limiting without boundary spikes.
typescriptclass SlidingWindowLog {
private logs: Map<string, number[]> = new Map();
private windowSize: number; // milliseconds
private limit: number;
constructor(windowSize: number, limit: number) {
this.windowSize = windowSize;
this.limit = limit;
}
allow(key: string): boolean {
const now = Date.now();
const windowStart = now - this.windowSize;
// Get or initialize log for key
const timestamps = this.logs.get(key) || [];
this.logs.set(key, timestamps);
// Remove timestamps outside the window
const validTimestamps = timestamps.filter(t => t > windowStart);
// Check limit
if (validTimestamps.length >= this.limit) {
this.logs.set(key, validTimestamps); // Update with filtered timestamps
return false;
}
// Add current request timestamp
validTimestamps.push(now);
this.logs.set(key, validTimestamps);
return true;
}
// Get remaining requests in window
getRemaining(key: string): number {
const now = Date.now();
const windowStart = now - this.windowSize;
const timestamps = this.logs.get(key) || [];
const validTimestamps = timestamps.filter(t => t > windowStart);
this.logs.set(key, validTimestamps);
return Math.max(0, this.limit - validTimestamps.length);
}
// Get oldest timestamp in window (for retry-after calculation)
getOldestTimestamp(key: string): number | null {
const now = Date.now();
const windowStart = now - this.windowSize;
const timestamps = this.logs.get(key) || [];
const validTimestamps = timestamps.filter(t => t > windowStart);
if (validTimestamps.length === 0) {
return null;
}
return validTimestamps[0];
}
}
// Usage: Allow 100 requests per minute with precise window
const rateLimiter = new SlidingWindowLog(60000, 100);
async function handleRequest(request: Request): Promise<Response> {
const clientKey = getClientKey(request);
if (!rateLimiter.allow(clientKey)) {
const oldestTimestamp = rateLimiter.getOldestTimestamp(clientKey);
const waitTime = oldestTimestamp
? oldestTimestamp + rateLimiter['windowSize'] - Date.now()
: rateLimiter['windowSize'];
return new Response('Too Many Requests', {
status: 429,
headers: {
'Retry-After': Math.ceil(waitTime / 1000).toString(),
'X-RateLimit-Limit': '100',
'X-RateLimit-Remaining': rateLimiter.getRemaining(clientKey).toString(),
}
});
}
return await processRequest(request);
}Characteristics:
- Precise limiting without boundary spikes
- Memory efficient (stores only timestamps)
- Smooth traffic distribution
- More complex implementation than fixed window
Best for:
- APIs requiring precise rate limiting
- Scenarios where boundary spikes are unacceptable
- Production systems with strict capacity requirements
Disadvantages:
- Higher memory usage than counter-based approaches
- More complex to implement
- Requires cleanup of old timestamps
Distributed rate limiting
Redis-based rate limiting
For distributed systems, rate limiting state must be shared across instances. Redis provides a fast, shared data store for distributed rate limiting.
typescriptclass RedisRateLimiter {
private redis: RedisClient;
private windowSize: number;
private limit: number;
constructor(redis: RedisClient, windowSize: number, limit: number) {
this.redis = redis;
this.windowSize = windowSize;
this.limit = limit;
}
async allow(key: string): Promise<{allowed: boolean, remaining: number, resetTime: number}> {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const redisKey = `ratelimit:${key}:${windowStart}`;
// Get current count
const currentCount = await this.redis.get(redisKey);
const count = parseInt(currentCount || '0', 10);
// Check limit
if (count >= this.limit) {
const windowEnd = windowStart + this.windowSize;
return {
allowed: false,
remaining: 0,
resetTime: windowEnd
};
}
// Increment counter
const newCount = count + 1;
await this.redis.incr(redisKey);
// Set expiration
await this.redis.expireat(redisKey, windowStart + this.windowSize / 1000);
return {
allowed: true,
remaining: this.limit - newCount,
resetTime: windowStart + this.windowSize
};
}
}
// Usage with Express.js
const express = require('express');
const Redis = require('ioredis');
const app = express();
const redis = new Redis();
const rateLimiter = new RedisRateLimiter(redis, 60000, 100);
app.use(async (req, res, next) => {
const clientKey = getClientKey(req);
const result = await rateLimiter.allow(clientKey);
res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', result.remaining.toString());
res.set('X-RateLimit-Reset', new Date(result.resetTime).toUTCString());
if (!result.allowed) {
const retryAfter = Math.ceil((result.resetTime - Date.now()) / 1000);
res.set('Retry-After', retryAfter.toString());
return res.status(429).send('Too Many Requests');
}
next();
});Redis Lua script for atomic operations
For more complex rate limiting logic, use Redis Lua scripts to ensure atomicity:
lua-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_to_consume = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
-- Get current state
local state = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(state[1]) or capacity
local last_refill = tonumber(state[2]) or now
-- Refill tokens
local time_passed = (now - last_refill) / 1000
local tokens_to_add = time_passed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
-- Check if enough tokens
if tokens < tokens_to_consume then
-- Not enough tokens, return rejection
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600) -- 1 hour TTL
return {0, tokens}
end
-- Consume tokens
tokens = tokens - tokens_to_consume
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600) -- 1 hour TTL
return {1, tokens}typescript// TypeScript usage
async function consumeTokens(
redis: RedisClient,
key: string,
capacity: number,
refillRate: number,
tokensToConsume: number
): Promise<{allowed: boolean, remainingTokens: number}> {
const script = `
-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_to_consume = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
-- Get current state
local state = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(state[1]) or capacity
local last_refill = tonumber(state[2]) or now
-- Refill tokens
local time_passed = (now - last_refill) / 1000
local tokens_to_add = time_passed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
-- Check if enough tokens
if tokens < tokens_to_consume then
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {0, tokens}
end
-- Consume tokens
tokens = tokens - tokens_to_consume
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {1, tokens}
`;
const result = await redis.eval(
script,
1, // number of keys
key, // key
capacity.toString(), // ARGV[1]
refillRate.toString(), // ARGV[2]
tokensToConsume.toString(), // ARGV[3]
Date.now().toString() // ARGV[4]
);
return {
allowed: result[0] === 1,
remainingTokens: result[1]
};
}Rate limiting strategies
Multiple tiers
Implement different rate limits for different user tiers:
typescriptclass TieredRateLimiter {
private limiters: Map<string, RateLimiter> = new Map();
private userTierCache: Map<string, string> = new Map();
private tierLimits: Map<string, {windowSize: number, limit: number}> = new Map();
constructor() {
this.tierLimits.set('free', { windowSize: 60000, limit: 100 });
this.tierLimits.set('pro', { windowSize: 60000, limit: 1000 });
this.tierLimits.set('enterprise', { windowSize: 60000, limit: 10000 });
// Initialize limiters for each tier
this.tierLimits.forEach((config, tier) => {
this.limiters.set(tier, new FixedWindowCounter(config.windowSize, config.limit));
});
}
async allow(userId: string): Promise<{allowed: boolean, tier: string}> {
const tier = await this.getUserTier(userId);
const limiter = this.limiters.get(tier)!;
return {
allowed: limiter.allow(userId),
tier
};
}
private async getUserTier(userId: string): Promise<string> {
// Check cache first
if (this.userTierCache.has(userId)) {
return this.userTierCache.get(userId)!;
}
// Fetch from database
const user = await fetchUserFromDatabase(userId);
const tier = user.subscriptionTier || 'free';
// Cache for 5 minutes
this.userTierCache.set(userId, tier);
setTimeout(() => this.userTierCache.delete(userId), 300000);
return tier;
}
}Per-endpoint rate limiting
Different endpoints may have different rate limits:
typescriptclass EndpointRateLimiter {
private limiters: Map<string, Map<string, RateLimiter>> = new Map();
registerEndpoint(path: string, method: string, windowSize: number, limit: number): void {
if (!this.limiters.has(method)) {
this.limiters.set(method, new Map());
}
this.limiters.get(method)!.set(path, new FixedWindowCounter(windowSize, limit));
}
async allow(method: string, path: string, clientId: string): Promise<boolean> {
const methodLimiters = this.limiters.get(method);
if (!methodLimiters) {
return true; // No rate limit configured
}
const limiter = methodLimiters.get(path);
if (!limiter) {
return true; // No rate limit configured
}
return limiter.allow(`${method}:${path}:${clientId}`);
}
}
// Usage
const rateLimiter = new EndpointRateLimiter();
// Different limits for different endpoints
rateLimiter.registerEndpoint('/api/v1/users', 'GET', 60000, 100);
rateLimiter.registerEndpoint('/api/v1/users', 'POST', 60000, 10);
rateLimiter.registerEndpoint('/api/v1/search', 'GET', 60000, 1000);
// Middleware
app.use(async (req, res, next) => {
const clientId = getClientKey(req);
const allowed = await rateLimiter.allow(req.method, req.path, clientId);
if (!allowed) {
return res.status(429).send('Too Many Requests');
}
next();
});Circuit breaker integration
Combine rate limiting with circuit breakers for resilience:
typescriptclass CircuitBreakerRateLimiter {
private rateLimiter: RateLimiter;
private circuitBreaker: CircuitBreaker;
async execute(request: Request): Promise<Response> {
// Check rate limit first
const clientId = getClientKey(request);
if (!this.rateLimiter.allow(clientId)) {
return new Response('Too Many Requests', { status: 429 });
}
// Check circuit breaker
if (this.circuitBreaker.isOpen()) {
return new Response('Service Unavailable', { status: 503 });
}
try {
// Execute request
const response = await this.executeRequest(request);
this.circuitBreaker.recordSuccess();
return response;
} catch (error) {
this.circuitBreaker.recordFailure();
throw error;
}
}
}
class CircuitBreaker {
private failureCount = 0;
private lastFailureTime = 0;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private failureThreshold: number = 5,
private timeout: number = 60000 // 1 minute
) {}
isOpen(): boolean {
if (this.state === 'open') {
// Check if we should transition to half-open
if (Date.now() - this.lastFailureTime > this.timeout) {
this.state = 'half-open';
return false;
}
return true;
}
return false;
}
recordSuccess(): void {
this.failureCount = 0;
this.state = 'closed';
}
recordFailure(): void {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'open';
}
}
}Response headers and client experience
Rate limiting responses should provide clear guidance to clients:
typescriptasync function handleRateLimitedRequest(
request: Request,
rateLimiter: RateLimiter,
clientId: string
): Promise<Response> {
if (!rateLimiter.allow(clientId)) {
const remaining = rateLimiter.getRemaining(clientId);
const resetTime = rateLimiter.getResetTime(clientId);
return new Response('Too Many Requests', {
status: 429,
headers: {
'Content-Type': 'application/json',
'Retry-After': Math.ceil(resetTime / 1000).toString(),
'X-RateLimit-Limit': rateLimiter['limit'].toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': new Date(Date.now() + resetTime).toUTCString(),
}
});
}
return await processRequest(request);
}
// Include rate limit info in successful responses
function addRateLimitHeaders(
response: Response,
rateLimiter: RateLimiter,
clientId: string
): Response {
const headers = new Headers(response.headers);
headers.set('X-RateLimit-Limit', rateLimiter['limit'].toString());
headers.set('X-RateLimit-Remaining', rateLimiter.getRemaining(clientId).toString());
return new Response(response.body, {
status: response.status,
headers
});
}Monitoring and observability
Track rate limiting metrics to adjust limits and detect abuse:
typescriptclass RateLimitingMetrics {
private requestCounts: Map<string, number> = new Map();
private rejectionCounts: Map<string, number> = new Map();
private requestTimes: Array<{key: string, timestamp: number, duration: number}> = [];
recordRequest(key: string, allowed: boolean, duration: number): void {
// Count requests
const requestCount = this.requestCounts.get(key) || 0;
this.requestCounts.set(key, requestCount + 1);
// Count rejections
if (!allowed) {
const rejectionCount = this.rejectionCounts.get(key) || 0;
this.rejectionCounts.set(key, rejectionCount + 1);
}
// Record timing
this.requestTimes.push({ key, timestamp: Date.now(), duration });
// Cleanup old records (keep last 1000)
if (this.requestTimes.length > 1000) {
this.requestTimes.shift();
}
}
getMetrics(key: string): RateLimitMetrics {
const requestCount = this.requestCounts.get(key) || 0;
const rejectionCount = this.rejectionCounts.get(key) || 0;
const keyRequests = this.requestTimes.filter(r => r.key === key);
const avgDuration = keyRequests.length > 0
? keyRequests.reduce((sum, r) => sum + r.duration, 0) / keyRequests.length
: 0;
return {
key,
requestCount,
rejectionCount,
rejectionRate: requestCount > 0 ? rejectionCount / requestCount : 0,
averageRequestDuration: avgDuration,
};
}
getTopOffenders(limit: number = 10): RateLimitMetrics[] {
const allKeys = Array.from(new Set(this.requestTimes.map(r => r.key)));
return allKeys
.map(key => this.getMetrics(key))
.sort((a, b) => b.rejectionCount - a.rejectionCount)
.slice(0, limit);
}
}
interface RateLimitMetrics {
key: string;
requestCount: number;
rejectionCount: number;
rejectionRate: number;
averageRequestDuration: number;
}Decision framework
Choose the right algorithm
| Algorithm | Burst Tolerance | Precision | Complexity | Memory | Best For |
|---|---|---|---|---|---|
| Token Bucket | High | Medium | Low | Low | APIs tolerating bursts |
| Leaky Bucket | None | High | Medium | High | Predictable throughput |
| Fixed Window | Low | Low | Very Low | Very Low | Simple implementations |
| Sliding Window | Low | High | High | Medium | Precise limiting |
Evaluate requirements
Questions to ask:
- Does your API need to handle burst traffic?
- Yes → Token bucket or leaky bucket
- No → Fixed or sliding window
- How precise does your rate limiting need to be?
- Very precise → Sliding window
- Moderate precision → Token bucket
- Basic precision → Fixed window
- What is your operational complexity tolerance?
- Low tolerance → Fixed window
- Moderate tolerance → Token bucket
- High tolerance → Sliding window or leaky bucket
- Do you need distributed rate limiting?
- Yes → Redis or shared database
- No → In-memory implementation
Conclusion
Rate limiting is essential for protecting production APIs and ensuring fair resource allocation. The right algorithm depends on your specific requirements: burst tolerance, precision needs, and operational complexity.
Start with a simple implementation (fixed window or token bucket) and evolve as your requirements become clearer. Monitor rate limiting metrics continuously to adjust limits and detect patterns of abuse. The goal isn't to block legitimate users—it's to create predictable system behavior that protects both the infrastructure and user experience.
Practical closing question: What is the most common abuse pattern in your current API, and would a different rate limiting algorithm better address it?
Building a production API and need expert guidance on rate limiting and capacity management? Talk to Imperialis API specialists about implementing rate limiting strategies that protect your infrastructure while providing excellent user experience.
Sources
- Rate Limiting Algorithms — Medium article
- Redis Rate Limiting — Redis documentation
- RFC 6585: Additional HTTP Status Codes — HTTP specification
- API Rate Limiting Best Practices — Google Cloud documentation
- Rate Limiting in Distributed Systems — Figma engineering blog