Skip to main content
btheo.com btheo.com > press start to play
NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS
REACT 4 MIN READ

Streaming LLM Responses in React With Fetch Streams

WARNING · DRAGON AHEAD

Users hate waiting. Showing tokens appear in real-time feels 10x faster than a 5-second spinner. But streaming is hard: you must handle partial responses, cleanup memory, and respect backpressure or your app will crash.

Why Streaming Matters

A non-streaming LLM request:

  • ⚠️ User sees a spinner for 5+ seconds
  • ⚠️ First token arrives at the end (bad UX)
  • ⚠️ Can’t show progress or allow cancellation

A streaming response:

  • ✔ First token appears in 500ms (perceived speed increase)
  • ✔ Users can read as text generates
  • ✔ AbortController lets them cancel mid-generation

Three Approaches: SSE vs. Fetch Streams vs. WebSocket

ApproachLatencyComplexityBest For
Server-Sent Events (SSE)LowSimpleUnidirectional (server → client)
Fetch ReadableStreamLowMediumHTTP/2, custom chunking
WebSocketVery lowComplexBidirectional, real-time

We’ll use SSE for simplicity and Fetch ReadableStream for fine-grained control.

Node.js: Streaming with Server-Sent Events

Express endpoint that streams OpenAI responses:

import { OpenAI } from 'openai';
import express from 'express';
const app = express();
const client = new OpenAI();
app.post('/api/chat-stream', async (req, res) => {
const { message } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await client.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: message }],
});
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
res.write(`data: ${JSON.stringify({ token: chunk.delta.text })}\n\n`);
}
}
res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
res.end();
});
app.listen(3000);

Each token arrives as a newline-delimited JSON event.

React: Consuming SSE with EventSource

Simple and straightforward:

import { useEffect, useState } from 'react';
export function ChatSSE() {
const [response, setResponse] = useState('');
const streamMessage = (message: string) => {
const eventSource = new EventSource(
`/api/chat-stream?msg=${encodeURIComponent(message)}`
);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.done) {
eventSource.close();
} else {
setResponse(prev => prev + data.token);
}
};
eventSource.onerror = () => {
eventSource.close();
};
};
return (
<>
<button onClick={() => streamMessage('Hello!')}>Ask</button>
<p>{response}</p>
</>
);
}

EventSource limitation: Works only with GET requests and simple text. For more control, use fetch ReadableStream.

React: Consuming Fetch ReadableStream

More flexible, handles backpressure:

import { useEffect, useState, useRef } from 'react';
export function ChatStream() {
const [response, setResponse] = useState('');
const abortRef = useRef<AbortController | null>(null);
const streamMessage = async (message: string) => {
abortRef.current = new AbortController();
const response = await fetch('/api/chat-stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
signal: abortRef.current.signal,
});
const reader = response.body?.getReader();
if (!reader) return;
const decoder = new TextDecoder();
let buffer = '';
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || ''; // Keep incomplete line in buffer
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.token) {
setResponse(prev => prev + data.token);
}
}
}
}
} finally {
reader.releaseLock();
}
};
const handleCancel = () => {
abortRef.current?.abort();
};
useEffect(() => {
return () => {
// Cleanup on unmount
abortRef.current?.abort();
};
}, []);
return (
<>
<button onClick={() => streamMessage('Hello!')}>Ask</button>
<button onClick={handleCancel}>Stop</button>
<p>{response}</p>
</>
);
}

Key details:

  • AbortController lets users cancel mid-stream
  • TextDecoder handles UTF-8 chunking correctly (some tokens split across reads)
  • Buffer keeps incomplete lines until the next read
  • useEffect cleanup aborts on unmount (prevents dangling requests)

Handling Partial JSON

Chunks might arrive mid-JSON. Always buffer and parse carefully:

function parseChunks(buffer: string) {
const events: any[] = [];
const lines = buffer.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
try {
events.push(JSON.parse(line.slice(6)));
} catch (e) {
// Incomplete JSON, wait for next chunk
console.log('Incomplete JSON, buffering...');
}
}
}
return events;
}

Memory Leaks: Cleanup on Unmount

Common mistake: Forgetting to close the stream.

// BAD: Memory leak
useEffect(() => {
fetch('/api/stream').then(r => r.body?.getReader()); // No cleanup
}, []);
// GOOD: Cancel on unmount
useEffect(() => {
const abort = new AbortController();
fetch('/api/stream', { signal: abort.signal });
return () => abort.abort(); // Always cleanup
}, []);

Backpressure and Slow Clients

If your React component can’t render fast enough, the browser buffers data in memory. Solution: slow down the read:

async function readWithBackpressure(reader: ReadableStreamDefaultReader) {
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Process chunk
process(value);
// Give React time to render
await new Promise(resolve => setTimeout(resolve, 10));
}
}

Summary

Streaming makes your app feel faster. SSE is simple for one-way streams. Fetch ReadableStream gives you control. Buffer incomplete JSON. Always cleanup with AbortController. Handle backpressure gracefully. Your users will notice the responsiveness.

Stream first. Spinners are dead.

ALL POSTS →