Note:
Since this is my first blog related to system design, initially we’ll keep things simple and not put a lot of focus on scaling. Instead of directly designing the most optimal solution we will learn how to approach any problem from first principles.
Approaches to System Design
Spiral Approach
Decide the core and build your system around it. Core is use-case specific, eg: database, communication etc.
Tip:
Go with the spiral approach when designing the system is a no-brainer, for simple systems.Incremental Building
a. Start with a day zero architecture.
b. See how each component would behave:
- under load
- at scale
c. Identify the bottleneck
d. Re-architect
e. Repeat from step
b
Points to remember:
Understand the core property and access pattern.
Affinity towards a tech comes second.
Build an intuition towards building systems.
Storage
The requirement is to render online/offline status for every user in the UI. That means for each user we need to know whether the user is online or not. Each user can be represented using an integer (user_id) and the status can be represented using a boolean (status).
Access pattern: This is a key-value use case.
Interfacing API
If our API only returns the status for one user a time, given their user_id, we’ll have to make multiple API calls from the frontend for showing the status for multiple users, which would be costly, system would be overburdened and result in a poor experience.
To avoid the above case, we can expose a batch endpoint that accepts a bunch of user_ids and returns the status for all of them. This way it will reduce the number of network calls that your frontend has to make to the backend, while ensuring all the heavy lifting is done by the backend and frontend makes fewer network calls to get the exact same information.
GET /status/users?ids=u1,u2,u3,u4
Updating the database
Pull based model
Our API servers cannot “pull“ from the client because we cannot proactively talk to the client. (*unless there is a persistent connection)
Push based model
Users push their status periodically. Every user periodically sends “heartbeat“ to the service.
GET /heartbeat the authenticated user will be marked as "alive"
Now, do we need this heartbeat call to be made periodically or is just one call fine? For example, what if, instead of periodically sending the heartbeats, the user sends a heartbeat call once when it is online and once when it logs out? But, in that case, if the user’s machine crashes before sending a heartbeat when it logs out, the storage will indefinitely have the user’s status as online. Thus, even if we can know when a user is online, we have a fundamental limitation that prevents us from knowing when the user goes offline. Hence, we the user has to periodically send out heartbeat calls saying that it’s alive.
So, when is a user offline? When we don’t receive “heartbeat“ for long enough.
How long is long enough? That’s subjective. Depending on the system we are building, we will need to configure this time period based on the business logic.
Earlier we thought we needed to store the user_id (int) and status (bool), but now, instead of storing status (bool), we can simply store the time at which we received the last heartbeat (last_hb).
We can draw a rough schema, where we store pulse
table in the database with the fields: user_id
and last_hb
.
When we receive the heartbeat from user u1
:
UPDATE pulse
SET last_hb = NOW()
WHERE user_id = u1;
user_id | last_hb (epoch seconds) |
u1 | 1000 |
u2 | 1050 |
u3 | 1060 |
Get status API
GET /status/<user_id>
This should also work if multiple user_ids are provided.
User is offline:
If no entry is present in the database for user
or
If entry is present in the database and
entry.last_hb < NOW() - 30 sec
Otherwise, the user is online.
Let’s estimate the scale
Each entry in the pulse
table has 2 columns:
user_id: int (4B)
last_hb: int (4B)
Therefore, size of each entry = 8B
For each user, we need one entry in the table =>
100 users → 100 entries
1K users → 1K entries
1M users → 1M entries
1B users → 1B entries
Total storage required for 1B users, i.e. 1Billion entries = 1Billion*(8Bytes) = 8 GB
Now, 8 GB is not that big of a storage to worry about, our phones itself carry 128 GB!
This system does not require a massive storage. Still, can we do better on storage?
Requirement: All the end user wants to know is whether a user is online/offline?
What if absence → offline?
Idea: if user entry is not present in the database, we can say user is offline.
So, let’s expire (delete) the entries after 30 seconds.
If we delete entries, we save a bunch of space by not storing data of inactive users.
Total entries = Active users
If there are total 1B users and out of them, 100K users are active then total entries = 100K => Total size = 800KB
How to auto delete?
Approach 1: Write a CRON job that deletes expired entries.
not a robust solution (due to adding one more moving part into our design, we have to manage uptime, telemetry… for this new CRON service)
we need to handle edge case in the business logic
Never re-invent the wheel
Approach 2: Can we not offload this to our datastore?
Database with Key-Value access + expiration
—> Redis
—> DynamoDB
Upon receiving a heartbeat, we update entry in Redis/DynamoDB with ttl = 30s
Every heartbeat moves the expiration time forward!
Which database would you pick and why? (*cost estimation)
Redis | DynamoDB |
vendor lockin (cost/security issue) | persistence |
feature extensibility | managed service (depends on team size and expertise) |
time sensitivity (better performance not due to redis being in-memory DB, but because redis can hold persistent connections while DynamoDB is stateless) |
Note:
In real world, websockets are used in such systems, but since this is my first blog related to system design, we’ll keep things simple.
How is our DB doing?
Let’s say, heartbeat is sent every 10 seconds.
So, one user in 1 minute sends 6 heartbeats.
If there are 6M active users, our system will get 6M req/min.
Each heartbeat request results in 1 DB call. Our DB needs to handle 6M entries/min.
We’ve already concluded that storage is not a problem, but computation is!
How to compute better?
Before we jump into sharding/partitioning we need to highlight another point: connection pooling.
Here storage is not a concern, computation is. The number of queries that are being fired, where each query is doing an update on the database. Every heartbeat request that is coming in is making one database call.
When the user makes the request to the API server there an HTTP request → TCP connection.
When the API server makes the request to the database to write something, there is another TCP connection. For API server to talk to the database, a TCP connection needs to be established. In order to establish a TCP connection, the infamous three way handshake is required. To terminate a TCP connection, there needs to be a four way tear-down. That makes 7 network trips for every single DB call. This is why, the DB connection established between the API server and database is an expensive process. For every DB call we just doing one update in a row (micro-write) indexed by primary key. Where is the time going? Most of the time spent in the DB call, is in establishing the TCP connection and then tearing it down.
To decrease the load on the database, we add connection pools. The idea is, we have multiple pre-established connections between the API server and the database (can be configured as min: 3, max: 100). The connections are kept in a blocking queue, if we get a query and a connection is not being used, then that can be used by this query to perform the micro-write on the database and once that is done the connection can be freed to be used by other queries. If we don’t have any free connections in the blocking queue, the query can either wait for a connection to get freed up or a new TCP connection can be established between the API server and the database can be added to the blocking queue upto a limit of the max configured connections that can be established between the API server and the database.
Connection pool is not a service, it’s just a library running on the API server that handles the logic for establishing connections. It is the responsibility of the API server to then terminate any unnecessary number of inactive open connections based on the ttl
configured for the connections.
The size of the database depends on the number of concurrent queries that we fire on it. The most common bottleneck of a database is the maximum number of concurrent requests that it can handle, if you want to support concurrency more than that you either vertically scale the database or you can go for sharding.
Similar systems
- Failure detection in a distributed system.
That's all for now folks. See you in the next blog!