Payment handshake
Process payments through an external gateway with tight error tolerance and async settlement.
Kata overview
You do not need to be an expert to start. This kata keeps the stakes low so you can explore trade-offs, adjust the diagram, and see how the system responds.
Context for this system design kata
Process payments through an external gateway with tight error tolerance and async settlement. This system design kata keeps the stakes low so you can rehearse trade-offs before taking ideas into production reviews.
Scenario and practice focus
ClearPay processes payments from merchant checkouts through an external payment gateway. The payment gateway is reliable most of the time but has 220ms base latency and limited capacity - it starts erroring above 80% utilization. ClearPay must acknowledge payment intent quickly (before the queue), settle asynchronously (after the queue), and keep error rates extremely low. A separate status API lets merchants check payment state. The 0.5% error target is the tightest in the portfolio - it forces users to keep every component on the payment path well within capacity, because error rates compound across the path.
Difficulty: Advanced. Estimated time: 35–45 min. Domain: Fintech.
Constraints to balance
Operational pressure
- No manual retries or payment resubmission.
- Payment gateway is external with 220ms base latency and limited capacity.
Customer and product constraints
- Error rate tolerance is extremely low - 0.5%.
- Every component on the payment path adds cost and risk.
Scenarios to explore in the simulator
- Acknowledge payment intent within 300ms.
- Process settlement through the external payment gateway without dropping transactions.
- Keep checkout error rates below 0.5% - every error is a lost sale.
- Maintain a recovery buffer near the payment gateway.
- Keep costs controlled during payday spikes.
Learning outcomes
- Design payment paths with compound error rates in mind.
- Use accept latency vs processing latency to give fast acknowledgement with async settlement.
- Place recovery queues adjacent to external payment dependencies.
- Keep all components on the payment path well within capacity to control error rates.
Give it a try!