In this blog, we will study the factors that get affected by any system design decision. We will understand it while we design a blogging platform. We will deep dive into caching issues at scale and how to solve them, async processing, delegation, kafka essentials and different communication paradigms.
Foundational topics in System Design
Database
Caching
Scaling
Delegation
Concurrency
Communication
Any and every decision would affect one of the above factors.
We’ll design a multi-user blogging system (eg: medium.com)
one user multiple blogs
multiple users
Let’s take a look at the key design decisions.
Database
We’ll have two tables: users
and blogs
. Here’s is the schema for the tables:
users table:
id |
name |
bio |
blogs table:
id |
title |
body |
author_id |
published_at |
is_deleted |
Importance of is_deleted
: soft delete.
When user invokes delete blog, instead of DELETE we UPDATE is_deleted
.
Key reasons: Recoverability, archival, audit.
Also it’s easy on the database engine (no tree re-balancing).
Column Type: body v/s bio
body → long text: stored as a reference (LONGTEXT)
The database will have to make two disk calls, one disk call to get the row and another disk call to go to the reference for long text and get the body.
bio → short text: stored along with other columns (VARCHAR)
The database does this internally if we provide the correct data type for the columns.
Storing datetime in DB: published_at
Datetime as datetime
Serialized in some format: 30-11-2024T09:01:36Z
Convenient, sub-optimal, heavy on size and index.
Datetime as epoch integer
Seconds since 1ˢᵗ January 1970: 172562347162
Efficient, optimal, light weight.
Note
: Always make sure to capture epoch time in UTC for all regions, so that you can render it in different regions according to their time zones.Datetime as custom format
Real use case: Redbus shifted from using datetime to int, because they just needed dates, thus reducing the space consumed for date to just 4 Bytes!
YYYYMMDD: 20241130
Caching
Caching is anything that reduces response times by saving any heavy computation.
Note
: Cache is not only RAM based.
Typical use: reduce disk I/O or network I/O or compute.
Caches are just glorified hash tables (with some advanced data structures).
Most common use case: Cache (RAM) for application level cache, saves DB computations.
Caching at different levels
Database views (materialised)
Centralised remote cache
A cache stored on a dedicated server that's built on top of a key/value NoSQL store like Redis or Memcached.
Disk of API Server
Local disk I/O is anyway faster than making a network call to the centralised remote cache. It has more storage capacity than having a cache in the main memory (RAM) level of the API storage. Use it if there are infrequent changes in the DB and you are comfortable with inconsistencies in data making it okay for you to serve stale data sometimes.
Main memory (RAM) of API Server
limited storage
API server may crash → you also lose the cached data
inconsistency
Load balancer (API Gateway)
CDN (cache response)
Browser (local storage)
Use case: personalised recommendations
Rankings updated once a day and a bunch of recommendations (50) are stored in the browser cache, so that each time a user loads the page, we pick 10 random recommendations out of the 50 and render it on the page. This way each time the UX is different for the user but we only run the computations for getting the rankings once for the day.
Scaling
Ability to handle large number of concurrent requests. Two scaling strategies:
Vertical Scaling
Hulk: Make infra bulky, add more CPU, RAM, Disk.
Easy to manage
Risk of downtime
Vertical scaling is limited by the capabilities of the physical hardware on which the system is running.
Horizontal Scaling
Minions: Add more machines
Linear amplification
Fault tolerance
Complex architecture
Network partitioning
Good scaling plan: First scale vertically upto a point and then scale horizontally.
Note
: No premature optimisations.
For our medium.com, we scale:
vertically first
then horizontally
★ Horizontal scaling ≈ ∞ scaling, but there is a catch!!
Cascaded failures becomes the root cause of all the major outages. To prevent that: ensure your stateful components like DB, cache and other dependent services can handle those many concurrent requests. Hence, whenever you scale, always do it bottom up!
For our medium.com, we scale DB first and then the API server.
Scaling the database
Vertical scaling
Just go to the cloud console and update configurations.
Read Replicas
For our use case, there will always be more people who read the blogs on the platform than the publishers of the blogs → more read requests than writes. Hence, if we can scale reads, then we’ll have successfully scaled the database. To do that we make use of read replicas.
All the write requests go to the master. Then they would be asynchronously replicated on the replicas. For asynchronous replication always remember, the replicas pull the changes from the master.
Sharding
Let’s say there is so much write traffic that the master can’t handle it alone, so in this case we make multiple master nodes based on some id, that store mutually exclusive subsets of data. Each of the master nodes can have their own replicas that handle reads. You need to have a routing layer between the API server and the database that can route the write requests to the correct master node.
Delegation
Let’s add basic analytics to our blog!
What does not need to be done in real time, should not be done in real time.
In the context of a request, do what is essential and delegate the rest.
Core idea
: Delegate & respond.
In the profile page, to show the total number of blogs published by any user, we’ll need to add a new column total_blogs
in the users
table. We’ll be storing it pre-computed to avoid joins and runtime computation.
users table:
id |
name |
bio |
total_blogs |
SQS shown above is a broker. A broker is a buffer to keep the tasks and messages.
Common implementations of a broker
Message queue
Example: SQS, RabbitMQ.
Message stream
Example: Kafka, Kinesis.
Kafka essentials
Kafka is a message stream that holds messages (almost forever as per the deletion policy). Internally, kafka has topics (example: ON_PUBLISH). Every topics has ‘n‘ partitions. Message is sent into a topic and depending on the configured hash key it is put into a partition.
Within partition, messages are ordered. No ordering guarantee across partitions.
Limitation of kafka: number of consumers = number of partitions i.e. we can have ‘n’ types of consumers (search, analytics, backend…), but for each type, we can only have as many consumers as we have partitions.
Consumers can issue a commit to kafka after they have read a particular number of messages. This way, if a consumer crashes before commiting, then when the consumer resumes, it starts processing messages from the previous commit. So, kafka guarantees at-least-once delivery semantics.
Concurrency
Concurrency → to get faster execution → threads & multiprocessing.
Concurrency is the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions.
Issues with concurrency:
communication between threads
concurrent use of shared resources
→ database
→ in-memory variables
Handling concurrency:
Locks (optimist & pessimists)
Mutexes and Semaphores
Go lock free (CRDT)
We’ll touch locking in depth in the upcoming blogs.
Concurrency in our blogging platform: Two users clap the same blog, the view count should go up by +2. We protect our data through: transactions & atomic instructions.
Communication
The usual communication
Short Polling
eg: continuously refreshing cricket score, continuously checking if server is ready
Disadvantages:
HTTP overhead
requests and responses
Long Polling
Let’s say client hits the server to create some instance, server sends back the response only after the instance is created i.e. only when data is available.
eg: response only when the ball is bowled.
Connection re-established after timeout and retired.
Short Polling v/s Long Polling:
short polling sends response right away.
long polling sends response only when done, connection kept open for the entire duration.
eg: EC2 provisioning
short polling: gets status every few seconds
long polling: gets response when server is ready
WebSockets (WS)
★ Server can proactively send data to the client.
Advantages:
real time data transfer
low communication overhead
Applications:
real time communication
stock market ticker
live experiences
multi-player games
Server-sent events (SSE)
Applications:
stock market ticker
deployment logs streaming
Communication in our blog
Real time interactions:
twitter: like count updates without refresh.
medium: one article clapped, other readers should see it in real time.
instagram: live interaction.
That's all for now folks. See you in the next blog!