Nodivex from TraitSpan
Intermediate–Advanced

The status dashboard

Serve millions of status page reads during an incident using layered caching and read path isolation.

Kata overview

You do not need to be an expert to start. This kata keeps the stakes low so you can explore trade-offs, adjust the diagram, and see how the system responds.

SaaS / DevOps30–45 min

Context for this system design kata

Serve millions of status page reads during an incident using layered caching and read path isolation. This system design kata keeps the stakes low so you can rehearse trade-offs before taking ideas into production reviews.

Scenario and practice focus

When an incident is declared, users flood the status page. The content is identical for everyone viewing the same page - it only changes when the operator posts an update (every 5–15 minutes). Between updates, every request returns the same HTML/JSON. This is the perfect use case for aggressive caching: CDN at the edge, a fast API behind it, and a read replica so the status database primary only handles writes. The 80ms latency target is only achievable if the CDN serves most requests - without it, the API + DB path alone exceeds the target under load. The cost target is tight because incidents are spiky but infrequent - you can't justify always-on large instances.

Difficulty: Intermediate–Advanced. Estimated time: 30–45 min. Domain: SaaS / DevOps.

Constraints to balance

Operational pressure

  • No manual scaling during an incident.
  • Status page content is identical for all viewers - cache aggressively.

Customer and product constraints

  • Updates are infrequent (every 5-15 minutes during an incident).
  • Incidents are spiky but rare - you can't justify always-on large instances.

Scenarios to explore in the simulator

Trade-off prompts
  • Keep status pages fast during incident traffic surges.
  • Internal updates must not compete with external reads.
  • Serve incident traffic at minimal cost - incidents are spiky but brief.
  • Never let the status page itself become an incident.

Learning outcomes

What you will learn
  • Build a multi-layer read path (CDN → API → replica) where each layer reduces origin load.
  • Configure CDN cache hit ratio based on content update frequency.
  • Understand that 80ms p95 is only achievable with CDN - the API + DB path alone takes longer.
  • Isolate write path (operator updates) from read path (user views) using replicas.
Ready to run
Open this kata in the simulator.

Give it a try!

Open the app