The Failure logs.
AI can generate perfect code, but it can't tell you why it broke. Here are some real problems I've faced and how I systematically dismantled them.
The Ghost of the Tab Filter
The Problem
User filter state was resetting every time they switched tabs in the history dashboard.
The Fix
Discovered that the state was tied to a component that was unmounting. Refactored the state to a parent provider with memoized selectors.
The Insight
Local state is great until it isn't. Architectural lift is better than a quick patch.
Map Performance Meltdown
The Problem
The interactive map became sluggish with >100 markers, dropping to 15 FPS.
The Fix
Debugged rendering cycles and found unnecessary re-renders in the marker component. Implemented custom clustering logic and React.memo.
The Insight
Don't trust third-party components to be performant out of the box. Measure first.
The Phantom Scroll on First Load
The Problem
Goal: build a cinematic, animated portfolio with hero, particle background, and multiple glass sections. Complexity: heavy visuals, multiple sections, and scroll-triggered animations. Challenge: on first page load the scrollbar looked almost full and the first scroll 'stalled' as if the page was only one screen tall. After the first scroll, everything became normal.
The Fix
Root cause: unstable scroll height calculation on initial load when the scroll container is the default html/body. Fix: move scrolling to a dedicated container by setting body overflow hidden and making main the scroll container (height 100vh/100svh with overflow-y: auto). Updated scroll listeners/observers to use main.
The Insight
When initial scroll feels 'stuck' and the scrollbar size is wrong, suspect the scroll container, not the sections. A dedicated scroll container can stabilize layout on first paint.
Jellyfish Particles That Felt Dead
The Problem
The goal was a living, jellyfish-like particle field for the hero. The first shader version used simple parametric waves, but the motion felt flat and the 'breathing' looked like static noise instead of a swimming organism.
The Fix
Rebuilt the motion model: separated bell pulse and tentacle waves, added layered flow fields and slow expansion cycles, then tuned amplitudes for water-like motion. The result is a more organic, fluid swim with visible waves.
The Insight
If motion feels dead, it's usually the model, not the color. Separate the anatomy (bell vs tentacles), then layer multiple time scales.
The Double-Deduct Disaster
The Problem
E-commerce client reported customers seeing negative stock after flash sales. Orders were overselling products — 50 units in stock but 63 orders went through. Production incident during peak hours.
The Fix
Race condition: multiple concurrent requests reading the same stock count before any write completed. Implemented pessimistic locking with SELECT FOR UPDATE in a transaction, added Redis distributed lock as first defense, and database constraint as final safety net.
The Insight
Optimistic concurrency is fine for low traffic. For inventory during flash sales, assume the worst: lock first, validate second, and always have a database constraint as your last line of defense.
CRM Sync That Ate the Server
The Problem
CRM dashboard took 45 seconds to load customer list. Backend CPU spiked to 100% whenever sales team opened the page. Database showed thousands of queries per single request.
The Fix
Classic N+1 problem multiplied: fetching 100 customers, then for each: orders, contacts, last activities, and tags — separately. Rewrote with Prisma includes and proper relation loading. Added cursor pagination instead of offset. Query count dropped from 500+ to 3.
The Insight
N+1 is sneaky in ORMs. Always check query logs in development. If you see the same query pattern repeating, you have N+1. Eager loading and pagination are not optional for lists.
The Midnight Token Massacre
The Problem
Production CRM went down at 2 AM. All API requests returning 401 Unauthorized. No code deployment, no infrastructure changes. Support tickets flooding in from APAC clients.
The Fix
JWT refresh token rotation was working, but the cron job that cleaned expired tokens had a bug: it was deleting ALL tokens older than 24 hours, including valid refresh tokens. Added token type check to the cleanup query. Implemented graceful token refresh with retry mechanism.
The Insight
Background jobs are invisible until they break everything. Always test cleanup/maintenance jobs with production-like data volumes. Add safeguards that prevent mass deletions.
Payment Webhook Hell
The Problem
E-commerce orders stuck in 'pending' status. Customers paid but order never confirmed. Payment gateway showing successful charges, but our system had no record of webhook receipt.
The Fix
Webhook endpoint was throwing 500 due to database timeout on order update. Gateway retried 3 times, then gave up. No logging of failed webhooks. Added: idempotency keys, webhook queue with BullMQ, dead letter queue for failed processing, and Slack alerts for stuck orders.
The Insight
Webhooks are fire-and-forget from the sender's perspective. Your receiver must be bulletproof: fast acknowledgment, async processing, idempotency, and alerting. Never do heavy work in the webhook handler itself.
The Cache Stampede
The Problem
Every day at midnight, the e-commerce product catalog API would timeout for 2-3 minutes. Redis showed cache misses spiking. Database connections exhausted.
The Fix
Cache TTL for all products was set to expire at the same time (midnight). When cache expired, hundreds of concurrent requests hit the database simultaneously. Implemented cache stampede protection: random TTL jitter, mutex lock for cache regeneration, and stale-while-revalidate pattern.
The Insight
Uniform cache expiration is a time bomb. Add randomness to TTL. For hot data, use background refresh before expiration rather than on-demand regeneration.