Skip to main content
btheo.com btheo.com > press start to play
NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS
NODE.JS 5 MIN READ

Rate Limiting That Actually Scales: Redis Algorithms

WARNING · DRAGON AHEAD

In-memory rate limiting breaks the moment you deploy two instances.

Each server maintains its own counter. Client A hits server 1 ten times (9 left). Client A hits server 2 ten times (9 left). In reality, they’ve hit your API twenty times, but from the app’s perspective, each server sees only ten. The limit is effectively doubled.

This is why every production API uses distributed rate limiting with Redis. But which algorithm do you pick, and how do you implement it without tanking performance?

Why In-Memory Doesn’t Scale

// ❌ This breaks with multiple instances
const counters = new Map<string, number>();
app.use((req, res, next) => {
const ip = req.ip;
counters.set(ip, (counters.get(ip) ?? 0) + 1);
if (counters.get(ip)! > 100) {
return res.status(429).send('Too many requests');
}
next();
});

Deploy to 4 instances? Each instance allows 100 requests per key. Total capacity: 400 requests. Attacker needs only to distribute requests across instances.

Three Algorithms Compared

AlgorithmAccuracyMemoryBurst HandlingImplementation
Fixed WindowPoor (burst at boundary)LowAllows spikeSimple
Sliding Window LogExcellentHigh (stores all events)AccurateComplex
Token BucketGoodLowAllows burst (configurable)Moderate

Algorithm 1: Fixed Window (Simple, Flawed)

Divide time into buckets (60s, 1h). Count requests per bucket. When bucket expires, reset.

// Redis: keys are "rate:ip:timestamp"
const limit = 100;
const window = 60; // seconds
app.use(async (req, res, next) => {
const ip = req.ip;
const now = Math.floor(Date.now() / 1000);
const key = `rate:${ip}:${now}`;
const count = await redis.incr(key);
if (count === 1) {
await redis.expire(key, window);
}
if (count > limit) {
return res.status(429).send('Rate limited');
}
next();
});

Problem: Burst at window boundary. If limit is 100 per 60s, client can send 100 requests at t=0 and 100 at t=60.001—200 in 60 seconds.

Algorithm 2: Sliding Window Log (Accurate)

Store all request timestamps in a sorted set. On each request, count how many are within the window.

const limit = 100;
const window = 60; // seconds
app.use(async (req, res, next) => {
const ip = req.ip;
const key = `rate:${ip}`;
const now = Date.now();
const windowStart = now - window * 1000;
// Remove old requests outside the window
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in the window
const count = await redis.zcount(key, windowStart, now);
if (count >= limit) {
return res.status(429).send('Rate limited');
}
// Add this request
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.expire(key, window + 1);
next();
});

Accurate: No boundary bursts. Memory hungry: Stores every timestamp. At 10K requests/min, you’re storing thousands of entries per key.

Algorithm 3: Token Bucket (Smooth, Scalable)

Imagine a bucket that holds tokens. Each request consumes one token. Tokens refill at a constant rate.

const capacity = 100; // Max tokens in bucket
const refillRate = 10; // Tokens per second
app.use(async (req, res, next) => {
const ip = req.ip;
const key = `bucket:${ip}`;
// Get current bucket state
let tokens = parseInt(await redis.get(`${key}:tokens`) ?? String(capacity));
const lastRefill = parseInt(await redis.get(`${key}:refill`) ?? String(Date.now()));
const now = Date.now();
const secondsElapsed = (now - lastRefill) / 1000;
// Refill tokens based on elapsed time
tokens = Math.min(capacity, tokens + secondsElapsed * refillRate);
if (tokens < 1) {
const retryAfter = Math.ceil((1 - tokens) / refillRate);
res.set('Retry-After', String(retryAfter));
return res.status(429).send('Rate limited');
}
// Consume one token
tokens -= 1;
// Save state
await redis.set(`${key}:tokens`, String(Math.floor(tokens)), 'EX', 3600);
await redis.set(`${key}:refill`, String(now), 'EX', 3600);
res.set('X-RateLimit-Remaining', String(Math.floor(tokens)));
next();
});

Token bucket allows controlled bursts (up to capacity), then smooths to refill rate. Standard across APIs.

Using express-rate-limit with Redis

Don’t reinvent this. Use rate-limit-redis:

import RateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import redis from 'redis';
const client = redis.createClient({
host: 'localhost',
port: 6379
});
const limiter = RateLimit({
store: new RedisStore({
client: client,
prefix: 'rl:' // Redis key prefix
}),
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per window
message: 'Too many requests',
standardHeaders: true, // Return rate limit info in headers
legacyHeaders: false
});
app.use('/api/', limiter);

This implements sliding window with Redis under the hood.

Rate Limit Response Headers

Include these in every response:

res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', '42');
res.set('X-RateLimit-Reset', String(Math.floor(Date.now() / 1000) + 60));

Clients can check remaining quota before hitting the limit.

Multi-tier Rate Limiting

Different limits for different users:

const authLimiter = RateLimit({
store: new RedisStore({ client }),
windowMs: 15 * 60 * 1000,
max: (req, res) => {
if (req.user?.premium) return 1000; // Premium: 1000/min
if (req.user?.id) return 100; // Auth: 100/min
return 10; // Anonymous: 10/min
}
});
app.use('/api/', authLimiter);

Redis Cluster Considerations

Single Redis instance: Works, but is a single point of failure. Use for development/staging.

Redis Sentinel: Primary-replica setup with automatic failover. Used in production.

Redis Cluster: Sharded across multiple nodes. Each node holds a partition of keys. Rate limiting keys must hash consistently so the same IP always lands on the same shard.

// Cluster client automatically handles key routing
const client = redis.createCluster({
rootNodes: [
{ host: 'node1', port: 6379 },
{ host: 'node2', port: 6379 },
{ host: 'node3', port: 6379 }
]
});

Graceful Degradation: What If Redis Is Down?

Two strategies:

Fail Open (allow all traffic):

try {
const count = await redis.incr(key);
if (count > limit) return res.status(429).send('Rate limited');
} catch (e) {
console.error('Redis down, allowing request');
// Continue anyway
}
next();

Fail Closed (reject all traffic):

try {
const count = await redis.incr(key);
if (count > limit) return res.status(429).send('Rate limited');
} catch (e) {
return res.status(503).send('Service unavailable');
}
next();

Pick fail open for user-facing APIs (avoid outages). Pick fail closed for sensitive endpoints (auth, payments).

Summary

Token bucket is the gold standard: smooth, scalable, allows bursts.

Redis is non-negotiable for distributed systems.

express-rate-limit + redis-store handles 99% of cases.

Response headers tell clients how much quota remains.

Multi-tier limits for different user classes.

Graceful degradation when Redis fails.

At scale, rate limiting is cheap insurance against DDoS, scraping, and brute-force attacks. Redis cost is $0.05-$0.30/GB/month on managed services. Your API’s availability is worth more.

ALL POSTS →