Race Conditions: When 5 API Requests All Refresh the Token
The Heisenbug
Some bugs are easy. They happen every time, you debug them, you fix them. Done.
Then there are Heisenbugs - bugs that only appear when you’re not looking. Bugs that work fine when you test manually but fail 5% of the time in production. Bugs that make you question your sanity.
This is a story about one of those bugs.
“Works on my machine” - the battle cry of developers everywhere
The Symptoms
Users reported “random 401 errors” - intermittent authentication failures despite being logged in. The support tickets all had the same pattern:
- “I was browsing normally, then suddenly got logged out”
- “Some requests work, some fail, it’s random”
- “The dashboard shows my data but the sidebar shows ‘unauthorized’”
Impact Report
- Occurrence rate: ~5% of sessions
- Pattern: Only happened when loading data-heavy pages
- Time to resolve: 1 week (tricky to reproduce)
The Investigation
Let me show you the timeline of what was happening:
1
2
3
4
5
6
7
8
9
10
11
12
T+0ms: Request A starts, token expired
T+2ms: Request A calls /auth/refresh
T+5ms: Request B starts, token still shows expired
T+7ms: Request B calls /auth/refresh (DUPLICATE!)
T+10ms: Request C starts, calls /auth/refresh (TRIPLICATE!)
T+15ms: Request A gets new token, saves it
T+18ms: Request B gets ERROR (refresh token already used!)
T+20ms: Request C gets ERROR (refresh token already used!)
T+22ms: Requests B and C fail with 401
T+25ms: User sees partial data, some components show errors
Result: 2 of 3 requests failed randomly
The problem? My refresh token was one-time use (as it should be for security). But multiple requests were trying to use it simultaneously.
Request A, B, and C all trying to be the one that refreshes the token
The Buggy Code
Here’s what my original implementation looked like:
1
2
3
4
5
6
7
8
9
10
11
12
13
// BROKEN: Every request refreshes independently
api.interceptors.request.use(async (config) => {
const token = getAccessToken();
if (isTokenExpired(token)) {
// Problem: Multiple requests hit this simultaneously!
const newToken = await refreshToken();
setAccessToken(newToken);
}
config.headers.Authorization = `Bearer ${getAccessToken()}`;
return config;
});
The issue is subtle. When the dashboard loads, it might fire 5-10 API requests at once. If the token is expired (or about to expire), EACH request independently decides to refresh it. They all race to call the refresh endpoint, and only the first one wins.
The Fix: Token Refresh Mutex
The solution is a classic concurrency pattern - a mutex-like approach where only one request does the refresh, and others wait for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// lib/api.ts
// Mutex-like state
let isRefreshing = false;
let refreshSubscribers: Array<(token: string) => void> = [];
function subscribeToRefresh(callback: (token: string) => void) {
refreshSubscribers.push(callback);
}
function notifySubscribers(token: string) {
refreshSubscribers.forEach(callback => callback(token));
refreshSubscribers = [];
}
api.interceptors.request.use(async (config) => {
// Skip auth for auth endpoints themselves
if (config.url?.includes('/auth/')) {
return config;
}
const token = getAccessToken();
const REFRESH_BUFFER_SECONDS = 120; // Refresh 2 min before expiry
// Check if token needs refresh
if (isTokenExpired(token) || willExpireSoon(token, REFRESH_BUFFER_SECONDS)) {
if (!isRefreshing) {
// This request wins the race - it does the refresh
isRefreshing = true;
try {
const response = await axios.post('/auth/refresh', {
refresh_token: getRefreshToken()
});
const newToken = response.data.access_token;
setAccessToken(newToken);
// Notify all waiting requests
notifySubscribers(newToken);
config.headers.Authorization = `Bearer ${newToken}`;
} catch (error) {
// Refresh failed - clear everything and redirect
clearTokens();
window.location.href = '/login?expired=true';
throw error;
} finally {
isRefreshing = false;
}
} else {
// Another request is already refreshing - wait for it
return new Promise((resolve) => {
subscribeToRefresh((newToken) => {
config.headers.Authorization = `Bearer ${newToken}`;
resolve(config);
});
});
}
} else {
// Token is still valid
config.headers.Authorization = `Bearer ${token}`;
}
return config;
});
Visualizing the Fix
1
2
3
4
5
6
7
8
9
10
11
12
BEFORE (Race Condition):
─────────────────────────────────────────────────
Request A ──▶ refresh() ──▶ ✓
Request B ──▶ refresh() ──▶ ✗ (token invalidated)
Request C ──▶ refresh() ──▶ ✗ (token invalidated)
AFTER (Queue Pattern):
─────────────────────────────────────────────────
Request A ──▶ refresh() ──▶ ✓ ──▶ notify all
Request B ──▶ [waiting...] ────▶ ✓ (uses A's token)
Request C ──▶ [waiting...] ────▶ ✓ (uses A's token)
The Subscriber Pattern Explained
The key insight is the subscriber pattern:
- First request sets
isRefreshing = trueand makes the actual refresh call - Subsequent requests see
isRefreshingis already true, so they add a callback torefreshSubscribersand return a Promise that won’t resolve until notified - When refresh completes,
notifySubscribers()calls all waiting callbacks with the new token - All Promises resolve with their configs now containing the fresh token
It’s like a velvet rope at a club. The first person talks to the bouncer (makes the API call), everyone else waits in line (subscribes), and when the door opens (token received), everyone goes in together.
Proactive Refresh
Notice the REFRESH_BUFFER_SECONDS constant:
1
2
3
const REFRESH_BUFFER_SECONDS = 120; // Refresh 2 min before expiry
if (isTokenExpired(token) || willExpireSoon(token, REFRESH_BUFFER_SECONDS)) {
This refreshes the token 2 minutes before it expires. Why? Because if the token expires during a request, it’s too late. By refreshing proactively, we ensure the token is always valid when requests go out.
The Results
| Metric | Before | After |
|---|---|---|
| Duplicate refresh calls | 3-10 per session | 0 |
| 401 errors (false positive) | ~5% of sessions | 0% |
| Refresh API calls/hour | ~150 | ~20 |
| User-reported login issues | 8/week | 0 |
Bonus: Server-Side Protection
Even with the client-side fix, I added server-side protection too. If somehow duplicate refresh requests come in, the server should handle it gracefully:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@router.post("/auth/refresh")
async def refresh_token(request: RefreshRequest, db: AsyncSession = Depends(get_db)):
# Find the refresh token
token = await db.scalar(
select(RefreshToken).where(
RefreshToken.token == request.refresh_token,
RefreshToken.is_used == False, # Not already used
RefreshToken.expires_at > datetime.utcnow()
)
)
if not token:
raise HTTPException(401, "Invalid or expired refresh token")
# Mark as used IMMEDIATELY (before generating new tokens)
token.is_used = True
await db.commit()
# Now generate new tokens...
The key is marking the token as used before doing anything else. This way, even if two requests somehow get through, only the first one succeeds.
I am once again asking you to use a mutex for your shared resources
Key Takeaways
Concurrent requests share state - When your UI fires multiple API calls, they might all see the same “expired” state.
Use a mutex for shared resources - The subscriber pattern ensures only one refresh happens.
Proactive refresh > Reactive refresh - Refresh before expiry, not after.
Defense in depth - Client-side mutex + server-side protection = belt and suspenders.
Log everything during debugging - I only found this bug by adding detailed timing logs to every request.
Next time: The tale of 12 engineering lessons learned from building a full-stack platform, complete with a meme for each one.
This is part 4 of my “Building ShieldMod” series.