Ticket Drop
Keep checkout stable during an on-sale surge when a dependency partially fails.
Kata overview
You do not need to be an expert to start. This kata keeps the stakes low so you can explore trade-offs, adjust the diagram, and see how the system responds.
Context for this system design kata
Keep checkout stable during an on-sale surge when a dependency partially fails. This system design kata keeps the stakes low so you can rehearse trade-offs before taking ideas into production reviews.
Scenario and practice focus
A high-demand event goes on sale and fans flood the site. The system must protect the checkout experience while preventing oversell and avoiding dependency collapse. During the surge, a critical dependency (e.g., payments or identity verification) enters a brownout: it slows down and intermittently fails. Fans refresh and retry aggressively. The platform must degrade gracefully and recover cleanly without manual intervention.
Difficulty: Intermediate. Estimated time: 60–90 min. Domain: Consumer Internet.
Constraints to balance
Operational pressure
- No manual steps in the loop
- Must prevent oversell (inventory correctness is non-negotiable)
- A critical dependency can enter a brownout (slow + intermittent failures)
Customer and product constraints
- Retry storms are expected; the system must resist amplification
- Keep estimated monthly cost at peak demand within budget
Scenarios to explore in the simulator
- Keep the on-sale experience stable and fair under extreme traffic.
- Prevent oversell and ensure purchase outcomes are trustworthy.
- Avoid retry storms and dependency collapse during brownouts.
- Drain backlog safely after recovery without re-triggering failures.
- Maintain predictable cost characteristics during on-sale windows.
Learning outcomes
- Design admission control that remains fast and fair under surge.
- Protect critical dependencies with intentional backpressure and bounded retries.
- Separate reservation from confirmation without creating “charged but no ticket” outcomes.
- Implement controlled recovery that drains backlog safely and predictably over time.
Give it a try!