Streaming LLM Responses in React With Fetch Streams
Users hate waiting. Showing tokens appear in real-time feels 10x faster than a 5-second spinner. But streaming is hard: you must handle partial responses, cleanup memory, and respect backpressure or your app will crash.
Why Streaming Matters
A non-streaming LLM request:
- ⚠️ User sees a spinner for 5+ seconds
- ⚠️ First token arrives at the end (bad UX)
- ⚠️ Can’t show progress or allow cancellation
A streaming response:
- ✔ First token appears in 500ms (perceived speed increase)
- ✔ Users can read as text generates
- ✔ AbortController lets them cancel mid-generation
Three Approaches: SSE vs. Fetch Streams vs. WebSocket
| Approach | Latency | Complexity | Best For |
|---|---|---|---|
| Server-Sent Events (SSE) | Low | Simple | Unidirectional (server → client) |
| Fetch ReadableStream | Low | Medium | HTTP/2, custom chunking |
| WebSocket | Very low | Complex | Bidirectional, real-time |
We’ll use SSE for simplicity and Fetch ReadableStream for fine-grained control.
Node.js: Streaming with Server-Sent Events
Express endpoint that streams OpenAI responses:
import { OpenAI } from 'openai';import express from 'express';
const app = express();const client = new OpenAI();
app.post('/api/chat-stream', async (req, res) => { const { message } = req.body;
res.setHeader('Content-Type', 'text/event-stream'); res.setHeader('Cache-Control', 'no-cache'); res.setHeader('Connection', 'keep-alive');
const stream = await client.messages.stream({ model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, messages: [{ role: 'user', content: message }], });
for await (const chunk of stream) { if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') { res.write(`data: ${JSON.stringify({ token: chunk.delta.text })}\n\n`); } }
res.write(`data: ${JSON.stringify({ done: true })}\n\n`); res.end();});
app.listen(3000);Each token arrives as a newline-delimited JSON event.
React: Consuming SSE with EventSource
Simple and straightforward:
import { useEffect, useState } from 'react';
export function ChatSSE() { const [response, setResponse] = useState('');
const streamMessage = (message: string) => { const eventSource = new EventSource( `/api/chat-stream?msg=${encodeURIComponent(message)}` );
eventSource.onmessage = (event) => { const data = JSON.parse(event.data); if (data.done) { eventSource.close(); } else { setResponse(prev => prev + data.token); } };
eventSource.onerror = () => { eventSource.close(); }; };
return ( <> <button onClick={() => streamMessage('Hello!')}>Ask</button> <p>{response}</p> </> );}EventSource limitation: Works only with GET requests and simple text. For more control, use fetch ReadableStream.
React: Consuming Fetch ReadableStream
More flexible, handles backpressure:
import { useEffect, useState, useRef } from 'react';
export function ChatStream() { const [response, setResponse] = useState(''); const abortRef = useRef<AbortController | null>(null);
const streamMessage = async (message: string) => { abortRef.current = new AbortController();
const response = await fetch('/api/chat-stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message }), signal: abortRef.current.signal, });
const reader = response.body?.getReader(); if (!reader) return;
const decoder = new TextDecoder(); let buffer = '';
try { while (true) { const { done, value } = await reader.read(); if (done) break;
buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop() || ''; // Keep incomplete line in buffer
for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.token) { setResponse(prev => prev + data.token); } } } } } finally { reader.releaseLock(); } };
const handleCancel = () => { abortRef.current?.abort(); };
useEffect(() => { return () => { // Cleanup on unmount abortRef.current?.abort(); }; }, []);
return ( <> <button onClick={() => streamMessage('Hello!')}>Ask</button> <button onClick={handleCancel}>Stop</button> <p>{response}</p> </> );}Key details:
- ✔ AbortController lets users cancel mid-stream
- ✔ TextDecoder handles UTF-8 chunking correctly (some tokens split across reads)
- ✔ Buffer keeps incomplete lines until the next read
- ✔ useEffect cleanup aborts on unmount (prevents dangling requests)
Handling Partial JSON
Chunks might arrive mid-JSON. Always buffer and parse carefully:
function parseChunks(buffer: string) { const events: any[] = []; const lines = buffer.split('\n');
for (const line of lines) { if (line.startsWith('data: ')) { try { events.push(JSON.parse(line.slice(6))); } catch (e) { // Incomplete JSON, wait for next chunk console.log('Incomplete JSON, buffering...'); } } }
return events;}Memory Leaks: Cleanup on Unmount
Common mistake: Forgetting to close the stream.
// BAD: Memory leakuseEffect(() => { fetch('/api/stream').then(r => r.body?.getReader()); // No cleanup}, []);
// GOOD: Cancel on unmountuseEffect(() => { const abort = new AbortController(); fetch('/api/stream', { signal: abort.signal });
return () => abort.abort(); // Always cleanup}, []);Backpressure and Slow Clients
If your React component can’t render fast enough, the browser buffers data in memory. Solution: slow down the read:
async function readWithBackpressure(reader: ReadableStreamDefaultReader) { while (true) { const { done, value } = await reader.read(); if (done) break;
// Process chunk process(value);
// Give React time to render await new Promise(resolve => setTimeout(resolve, 10)); }}Summary
Streaming makes your app feel faster. SSE is simple for one-way streams. Fetch ReadableStream gives you control. Buffer incomplete JSON. Always cleanup with AbortController. Handle backpressure gracefully. Your users will notice the responsiveness.
Stream first. Spinners are dead.