โš™ Engineering Deep-Dive

Scalable Seat-Booking System

A production-grade distributed system solving the double-booking problem under high concurrency โ€” with load tests, race-condition proofs, and architecture diagrams.

16.6ร—
Cache Speedup
8.9 ms vs 148 ms
62.5%
Cache Hit Rate
Redis layer, TTL 300s
37 RPS
Peak Throughput
80 concurrent users
0 / 50
Double-Bookings
50-user race condition test
๐ŸŽฏ

The Problem

Seat booking is a classic distributed systems problem: when two users click โ€œBookโ€ on the same seat at the same millisecond, a naive implementation will issue two INSERT statements that both succeed โ€” resulting in a double-booking. This is not a hypothetical edge case; it is the default behaviour of any system that reads availability before writing a reservation.

The challenge is to guarantee exactly-once booking under arbitrary concurrency without serialising every request through a single lock โ€” which would destroy throughput. This system solves it at the database layer using a composite UNIQUE(event_id, seat_id) constraint on the tickets table, so the database itself becomes the arbiter.

โšก Key insight
No application-level mutex needed. The DB constraint turns a race into a first-write-wins with automatic rollback and HTTP 409 for losers โ€” atomic and correct under any load.
๐Ÿ—

System Architecture

Seven containerised services communicate over a shared Docker network. The API tier is stateless โ€” all shared state lives in Postgres and Redis โ€” so horizontal scaling is as simple as adding more API replicas behind a load balancer.

System Architecture Diagram
๐Ÿ” Click to zoom
System Architecture Diagram
ServiceImagePortRole
apiFastAPI + Uvicorn8000HTTP API, JWT auth, booking logic
dbPostgreSQL 155434 โ†’ 5432Primary data store
redisRedis 76380 โ†’ 6379Cache + metrics counters
rabbitmqRabbitMQ 3.135672 / 15672Message broker
workerCelery 5.4โ€”Async email notifications
mailpitMailpit8025 / 1025Dev email capture
test-dbPostgreSQL 155433 โ†’ 5432Isolated test database
๐Ÿ”„

Request Flow

Every POST /api/v1/bookings call passes through six distinct layers before a 201 is returned โ€” and three additional async steps fire after the response is sent to the client.

Request Flow Swimlane Diagram
๐Ÿ” Click to zoom
Request Flow Swimlane Diagram
๐Ÿ”’ Where the lock happens
db.flush() materialises the INSERT inside the open transaction. PostgreSQL enforces UNIQUE(event_id, seat_id) at flush time โ€” not commit time โ€” so conflicts surface immediately and the transaction is rolled back before any commit overhead.
๐Ÿงช

Concurrency Proof

The test fires 50 simultaneous booking requests for a single seat using asyncio.gather โ€” the closest approximation to a real thundering-herd scenario in a test suite.

Concurrency Proof โ€” 50 user race result
๐Ÿ” Click to zoom
Concurrency Proof โ€” 50 user race result
1
HTTP 201
Successful booking
49
HTTP 409
Conflict โ€” seat taken

The constraint that makes this work:

# backend/app/models/booking.py
class Ticket(Base):
    __tablename__ = "tickets"

    id         = Column(Integer, primary_key=True)
    booking_id = Column(Integer, ForeignKey("bookings.id"))
    event_id   = Column(Integer, ForeignKey("events.id"))
    seat_id    = Column(Integer, ForeignKey("seats.id"))

    __table_args__ = (
        UniqueConstraint("event_id", "seat_id", name="_event_seat_uc"),
    )

The service layer catches the database integrity error and converts it to a 409:

# backend/app/services/booking_service.py
try:
    db.flush()          # triggers UNIQUE constraint check
    db.commit()
    return booking
except IntegrityError:
    db.rollback()
    raise HTTPException(status_code=409, detail="Seat already booked")
Concurrency Mechanism Timeline
๐Ÿ” Click to zoom
Concurrency Mechanism Timeline
โšก

Performance & Caching

A Redis read-through cache sits in front of every GET /events/:id and event-listing query. On cache miss the result is stored with a 300-second TTL and atomic counters (Redis INCR) track hits, misses, and latency for the live /api/v1/metrics endpoint.

Performance Dashboard โ€” 4-panel chart
๐Ÿ” Click to zoom
Performance Dashboard โ€” 4-panel chart
8.9 ms
Cached response
Redis hit
148 ms
DB response
Cache miss
16.6ร—
Speedup
cache vs. DB

Cache instrumentation (simplified):

# backend/app/services/cache_service.py
async def get_event(event_id: int) -> dict | None:
    t0 = time.monotonic()
    data = await redis.get(f"event:{event_id}")
    latency_ms = (time.monotonic() - t0) * 1000

    if data:
        await redis.incr("metrics:cache_hits")
        await redis.incrbyfloat("metrics:cache_ms_total", latency_ms)
        return json.loads(data)

    await redis.incr("metrics:cache_misses")
    return None  # caller fetches from DB and populates cache
๐Ÿ“

LLD / Data Model

Six core entities. The Ticket table is the junction between a Booking and an Event + Seat pair โ€” and it carries the uniqueness constraint that enforces single-occupancy.

LLD Class Diagram (UML)
๐Ÿ” Click to zoom
LLD Class Diagram (UML)
Entity Relationship Diagram
๐Ÿ” Click to zoom
Entity Relationship Diagram
๐Ÿง 

Design Decisions

Every non-trivial architectural choice involved a trade-off. The diagram below documents five key decisions with the alternatives considered and the reasoning for the final choice.

Design Decisions โ€” 5 trade-off cards
๐Ÿ” Click to zoom
Design Decisions โ€” 5 trade-off cards
DecisionChosenWhy not the alternative
Concurrency lockDB UniqueConstraintRedis SETNX: extra round-trip, TTL risk; Pessimistic lock: serialises all writes
Cache invalidationTTL 300s (time-based)Event-driven invalidation: overkill for read-heavy event data
Async notificationsCelery + RabbitMQInline SMTP: blocks response; FastAPI BackgroundTask: no retry on crash
AuthJWT (stateless)Session cookies: requires session store, harder to scale horizontally
DB poolpool_size=20, overflow=10Single connection: serialises all queries; unlimited pool: OOM under burst
๐Ÿ› 

Tech Stack

Tech Stack and Service Map
๐Ÿ” Click to zoom
Tech Stack and Service Map
TechnologyWhy it's here
FastAPI 0.116Async Python API framework, OpenAPI docs out of the box
PostgreSQL 15Primary store โ€” ACID transactions, UniqueConstraint as the concurrency lock
Redis 7Read-through cache, atomic INCR counters for metrics, TTL 300s
Celery 5.4 + RabbitMQAsync notification pipeline, acks_late=True for guaranteed delivery
SQLAlchemy 2ORM with pool_size=20, max_overflow=10 โ€” handles burst traffic
Next.js 15 (App Router)Frontend โ€” SSR + client components, deployed at booking.404by.me
Docker Compose7-service local stack: api, db, redis, rabbitmq, worker, mailpit, test-db
AlembicSchema migrations โ€” version-controlled DB changes
๐Ÿ“ˆ

Scaling Roadmap

The current architecture handles ~37 RPS comfortably on a single API replica. Here is the path to 10ร—, 100ร—, and beyond without changing the core booking logic.

Scaling Roadmap
๐Ÿ” Click to zoom
Scaling Roadmap
Now
1 API replica
Postgres primary
Redis single node
~37 RPS proven
Production
3โ€“5 API replicas
Read replicas
Redis Cluster
CDN for static assets
Hyperscale
Kubernetes HPA
Sharded Postgres
Event sourcing
Global edge cache