NODE.JS 5 MIN READ 11 NOV 2025

Rate Limiting That Actually Scales: Redis Algorithms

by Theodor

QUEST LOG ENTRY

WARNING · DRAGON AHEAD

In-memory rate limiting breaks the moment you deploy two instances.

Each server maintains its own counter. Client A hits server 1 ten times (9 left). Client A hits server 2 ten times (9 left). In reality, they’ve hit your API twenty times, but from the app’s perspective, each server sees only ten. The limit is effectively doubled.

This is why every production API uses distributed rate limiting with Redis. But which algorithm do you pick, and how do you implement it without tanking performance?

Why In-Memory Doesn’t Scale

// ❌ This breaks with multiple instances
const counters = new Map<string, number>();

app.use((req, res, next) => {
  const ip = req.ip;
  counters.set(ip, (counters.get(ip) ?? 0) + 1);

  if (counters.get(ip)! > 100) {
    return res.status(429).send('Too many requests');
  }
  next();
});

Deploy to 4 instances? Each instance allows 100 requests per key. Total capacity: 400 requests. Attacker needs only to distribute requests across instances.

Three Algorithms Compared

Algorithm	Accuracy	Memory	Burst Handling	Implementation
Fixed Window	Poor (burst at boundary)	Low	Allows spike	Simple
Sliding Window Log	Excellent	High (stores all events)	Accurate	Complex
Token Bucket	Good	Low	Allows burst (configurable)	Moderate

Algorithm 1: Fixed Window (Simple, Flawed)

Divide time into buckets (60s, 1h). Count requests per bucket. When bucket expires, reset.

// Redis: keys are "rate:ip:timestamp"
const limit = 100;
const window = 60; // seconds

app.use(async (req, res, next) => {
  const ip = req.ip;
  const now = Math.floor(Date.now() / 1000);
  const key = `rate:${ip}:${now}`;

  const count = await redis.incr(key);
  if (count === 1) {
    await redis.expire(key, window);
  }

  if (count > limit) {
    return res.status(429).send('Rate limited');
  }
  next();
});

Problem: Burst at window boundary. If limit is 100 per 60s, client can send 100 requests at t=0 and 100 at t=60.001—200 in 60 seconds.

Algorithm 2: Sliding Window Log (Accurate)

Store all request timestamps in a sorted set. On each request, count how many are within the window.

const limit = 100;
const window = 60; // seconds

app.use(async (req, res, next) => {
  const ip = req.ip;
  const key = `rate:${ip}`;
  const now = Date.now();
  const windowStart = now - window * 1000;

  // Remove old requests outside the window
  await redis.zremrangebyscore(key, 0, windowStart);

  // Count requests in the window
  const count = await redis.zcount(key, windowStart, now);

  if (count >= limit) {
    return res.status(429).send('Rate limited');
  }

  // Add this request
  await redis.zadd(key, now, `${now}-${Math.random()}`);
  await redis.expire(key, window + 1);

  next();
});

Accurate: No boundary bursts. Memory hungry: Stores every timestamp. At 10K requests/min, you’re storing thousands of entries per key.

Algorithm 3: Token Bucket (Smooth, Scalable)

Imagine a bucket that holds tokens. Each request consumes one token. Tokens refill at a constant rate.

const capacity = 100; // Max tokens in bucket
const refillRate = 10; // Tokens per second

app.use(async (req, res, next) => {
  const ip = req.ip;
  const key = `bucket:${ip}`;

  // Get current bucket state
  let tokens = parseInt(await redis.get(`${key}:tokens`) ?? String(capacity));
  const lastRefill = parseInt(await redis.get(`${key}:refill`) ?? String(Date.now()));

  const now = Date.now();
  const secondsElapsed = (now - lastRefill) / 1000;

  // Refill tokens based on elapsed time
  tokens = Math.min(capacity, tokens + secondsElapsed * refillRate);

  if (tokens < 1) {
    const retryAfter = Math.ceil((1 - tokens) / refillRate);
    res.set('Retry-After', String(retryAfter));
    return res.status(429).send('Rate limited');
  }

  // Consume one token
  tokens -= 1;

  // Save state
  await redis.set(`${key}:tokens`, String(Math.floor(tokens)), 'EX', 3600);
  await redis.set(`${key}:refill`, String(now), 'EX', 3600);

  res.set('X-RateLimit-Remaining', String(Math.floor(tokens)));
  next();
});

Token bucket allows controlled bursts (up to capacity), then smooths to refill rate. Standard across APIs.

Using express-rate-limit with Redis

Don’t reinvent this. Use rate-limit-redis:

import RateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import redis from 'redis';

const client = redis.createClient({
  host: 'localhost',
  port: 6379
});

const limiter = RateLimit({
  store: new RedisStore({
    client: client,
    prefix: 'rl:' // Redis key prefix
  }),
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per window
  message: 'Too many requests',
  standardHeaders: true, // Return rate limit info in headers
  legacyHeaders: false
});

app.use('/api/', limiter);

This implements sliding window with Redis under the hood.

Rate Limit Response Headers

Include these in every response:

res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', '42');
res.set('X-RateLimit-Reset', String(Math.floor(Date.now() / 1000) + 60));

Clients can check remaining quota before hitting the limit.

Multi-tier Rate Limiting

Different limits for different users:

const authLimiter = RateLimit({
  store: new RedisStore({ client }),
  windowMs: 15 * 60 * 1000,
  max: (req, res) => {
    if (req.user?.premium) return 1000; // Premium: 1000/min
    if (req.user?.id) return 100;        // Auth: 100/min
    return 10;                            // Anonymous: 10/min
  }
});

app.use('/api/', authLimiter);

Redis Cluster Considerations

Single Redis instance: Works, but is a single point of failure. Use for development/staging.

Redis Sentinel: Primary-replica setup with automatic failover. Used in production.

Redis Cluster: Sharded across multiple nodes. Each node holds a partition of keys. Rate limiting keys must hash consistently so the same IP always lands on the same shard.

// Cluster client automatically handles key routing
const client = redis.createCluster({
  rootNodes: [
    { host: 'node1', port: 6379 },
    { host: 'node2', port: 6379 },
    { host: 'node3', port: 6379 }
  ]
});

Graceful Degradation: What If Redis Is Down?

Two strategies:

Fail Open (allow all traffic):

try {
  const count = await redis.incr(key);
  if (count > limit) return res.status(429).send('Rate limited');
} catch (e) {
  console.error('Redis down, allowing request');
  // Continue anyway
}
next();

Fail Closed (reject all traffic):

try {
  const count = await redis.incr(key);
  if (count > limit) return res.status(429).send('Rate limited');
} catch (e) {
  return res.status(503).send('Service unavailable');
}
next();

Pick fail open for user-facing APIs (avoid outages). Pick fail closed for sensitive endpoints (auth, payments).

Summary

✔ Token bucket is the gold standard: smooth, scalable, allows bursts.

✔ Redis is non-negotiable for distributed systems.

✔ express-rate-limit + redis-store handles 99% of cases.

✔ Response headers tell clients how much quota remains.

✔ Multi-tier limits for different user classes.

✔ Graceful degradation when Redis fails.

At scale, rate limiting is cheap insurance against DDoS, scraping, and brute-force attacks. Redis cost is $0.05-$0.30/GB/month on managed services. Your API’s availability is worth more.

ALL POSTS →