Legacy lifeline
Protect a fixed-capacity legacy system from modern traffic volumes without replacing it.
Kata overview
You do not need to be an expert to start. This kata keeps the stakes low so you can explore trade-offs, adjust the diagram, and see how the system responds.
Context for this system design kata
Protect a fixed-capacity legacy system from modern traffic volumes without replacing it. This system design kata keeps the stakes low so you can rehearse trade-offs before taking ideas into production reviews.
Scenario and practice focus
The mobile app needs account data - balances, recent transactions, statements. The only source of truth is a legacy mainframe that handles 40 RPS and has 220ms base latency. It cannot be scaled, modified, or replaced. The app team must build a modern API layer that serves customers quickly while never exceeding the mainframe's capacity. When mobile traffic spikes at month-end (everyone checking their balance after payday), the system must buffer, cache, or shed load - but never let the mainframe fall over. If it does, branches stop working too.
Difficulty: Intermediate. Estimated time: 25–40 min. Domain: Financial Services.
Constraints to balance
Operational pressure
- No manual throttling or load shedding by operators.
- Legacy mainframe capacity is fixed at 40 RPS - this cannot be changed.
Customer and product constraints
- Mainframe latency is 220ms+ and unpredictable under load.
- The API layer's cost must justify itself versus branch-only access.
Scenarios to explore in the simulator
- Serve mobile customers under 500ms p95.
- Never exceed the legacy system's 40 RPS hard limit.
- Keep the system stable during month-end spikes.
- Maintain a recovery buffer so work isn't lost if the legacy system slows.
- Controlled costs even at peak.
Learning outcomes
- Design protective layers around fixed-capacity dependencies.
- Use queue buffering to absorb traffic that exceeds a dependency's capacity.
- Understand that error rates emerge from utilization - and compound across paths.
- Place recovery queues adjacent to fragile components for resilience.
Give it a try!