From predicting driver locations in real-time to syncing files across devices, from counting ad impressions to mapping social connections, modern applications demand solutions to computational problems that seem deceptively simple at first glance. How do you find all drivers within 5 kilometers when you have millions scattered across a city? How do you track unique viewers on a viral video without storing every single user ID? How do you handle billions of follow relationships without drowning in the complexity of graph databases? These aren't just academic exercises, they're real challenges faced by companies like Uber, Dropbox, YouTube and Twitter and they've driven some of the most elegant engineering innovations of the past decade. In this exploration, we'll dive into five fascinating system design patterns that power the applications we use every day: raycasting for geofencing, HyperLogLog for probabilistic counting, file chunking for efficient synchronization, geospatial indexing for proximity search and bidirectional edge storage for social graphs. Each pattern reveals how creative thinking, mathematical insight and pragmatic engineering combine to build systems that work at massive scale.

Raycasting

Raycasting is a computational geometry technique used to determine whether a point lies inside or outside a polygon. The algorithm works by casting an imaginary ray from the test point in any direction (typically horizontal) and counting how many times it intersects the polygon's edges. If the ray crosses an odd number of edges, the point is inside; if even, it's outside.

This elegant solution has found applications far beyond basic geometry checks. Location-based services use raycasting to determine if users are within specific geographic boundaries, triggering contextual reminders or notifications. Navigation systems employ it to verify whether vehicles are on designated routes or within restricted zones. Ride-sharing apps like Uber and Lyft use raycasting to match riders with drivers in specific service areas, while games like Pokémon Go rely on it to determine when players enter zones where certain creatures can be caught or special events occur.

The algorithm's efficiency, running in O(n) time for a polygon with n vertices, makes it practical for real-time applications where rapid spatial queries are essential.

Impressions Counting

Impression counting is a fundamental metric tracking system used across major platforms like LinkedIn, YouTube, Google Search, Reddit, Google AdSense, Instagram and TikTok. The system tracks views, impressions and user engagement, but faces a key challenge: not everyone who sees content actually engages with it, making true engagement measurement difficult.

The Core Problem

Given a stream of view events over time, count the distinct users within a specified time window. Since impressions occur continuously and user IDs (typically 4-byte integers) can repeat, the system must efficiently track unique visitors while handling massive scale. For example, with 1 million ad impressions per minute, processing would require 4MB of data per ad, or 240MB per hour, prohibitively expensive when multiplied across thousands of customers.

Solution: HyperLogLog with Cardinality Estimation

The system leverages Redis's HyperLogLog data structure, which provides probabilistic cardinality estimation using only 12KB of memory per set, a 99.85% space reduction compared to naive approaches. Rather than storing complete user sets, HyperLogLog approximates unique counts with minimal error.

The architecture works as follows: Kafka ingests view events, a counting service processes them through a rules engine (filtering unwanted events like self-views or those under 5 minutes watch time), then stores user IDs in Redis HyperLogLog structures keyed by post ID and timestamp. For queries, the analytics engine merges relevant HLLs using Redis's PFMERGE command and returns the cardinality via PFCOUNT.

To optimize costs at scale, the system employs a tiered storage strategy, recent data (30 minutes to 2 hours) stays in Redis for real-time queries, while older data periodically migrates to cheaper database storage. Missing keys are fetched from the database on demand and the HLL filter mechanism ensures granular time-based queries remain efficient even across distributed storage layers.

Remote File Sync

Building a distributed file synchronization system like Dropbox presents unique technical challenges around efficient uploads, downloads and change propagation across multiple devices. When a user modifies a file on their Windows machine, that change must propagate reliably to their Android phone, Mac laptop and cloud storage, all while handling network failures, partial transfers and concurrent edits.

Challenge 1: Resumable Uploads and Downloads

Large file transfers over unreliable networks demand a chunking strategy that enables parallel uploads and graceful recovery from failures. Files are divided into fixed-size blocks (typically 4MB), with each block hashed to generate a unique identifier. For example, a 14MB video file (video.avi) splits into four blocks with hashes h1, h2, h3, h4. These blocks are stored in blob storage (S3) using paths like s3://my-dropbox/<account-id>/<block-hash>, making storage content-addressable and deduplication automatic.

When a client commits a file, it first contacts the metaserver with the proposed blocklist. The metaserver queries the Blocks DB to identify which blocks already exist in the system. The client then uploads only the missing blocks to the block server in parallel, waiting for acknowledgment before moving to the next batch. Once all blocks are uploaded, the client re-commits the file and the metaserver records the complete file metadata including the namespace ID (account), relative path, blocklist and a monotonically increasing version ID.

Challenge 2: Change Notification and Synchronization

The critical insight for efficient sync is the version ID field in the File Metadata DB. Every update to any file within a namespace increments this counter, creating a unified way to sequentialize changes. Each client maintains a local cursor representing the last version it has synced. When a client wants to check for updates, it simply asks: "I'm at version 2, what's new?" The metaserver can then forward all changes from version 3 onward, ensuring clients receive updates in correct chronological order.

Multi-Version Support

The version-based model naturally enables powerful features like file history and rollback. Since each modification creates a new version entry with its own blocklist, reconstructing any historical version simply requires fetching the appropriate blocks from storage. Version 2 of video.avi might have blocks h1, h2, h3', h4, where only block 3 changed. Because unmodified blocks (h1, h2, h4) remain on the block server, historical versions can be reconstructed without duplicating unchanged data.

This architecture, combining content-addressable storage, versioned metadata and differential synchronization, powers not just Dropbox but similar systems like Google Drive, Slack, Teams and even WhatsApp, anywhere efficient file updates and multi-device sync are essential.

Merkle Trees for Efficient Synchronization

For large directory structures with thousands of files, comparing version numbers for every file individually becomes prohibitively expensive. Merkle trees provide an elegant solution by organizing file hashes into a hierarchical tree structure where each parent node's hash is computed from its children's hashes. When a client needs to sync a directory, it can first compare the root hash with the server, if they match, the entire directory is identical and no further work is needed. If they differ, the client recursively compares subtree hashes to pinpoint exactly which subdirectories contain changes, dramatically reducing the amount of metadata that needs to be transferred. This allows Dropbox-like systems to efficiently answer "what changed?" across massive file hierarchies with just a few hash comparisons, rather than comparing metadata for every single file. The same Merkle tree approach is used in Git for efficient repository synchronization and in blockchain systems for verifying data integrity.

GeoSpatial Search

The ability to efficiently answer "what's near me?" has enabled an entire generation of location-based services. From ride-sharing apps like Uber and Ola to food delivery platforms like Swiggy and Zomato, from social features like Facebook's "Nearby Friends" to mapping services and vehicle rentals, all rely on solving a fundamental challenge: geo-proximity search.

The Problem: Finding Spatial Neighbors

The core problem is deceptively simple: given your current location (latitude, longitude), find all people, drivers, restaurants, or points of interest within a k-kilometer radius. However, naive approaches fail at scale. Computing distances between your location and millions of other points using the haversine formula is computationally prohibitive for real-time queries.

The Challenge: Why Geo Proximity Is Hard

Traditional database indexes work well for one-dimensional data, finding all records where a timestamp falls within a range, for example. But location data is inherently two-dimensional (latitude and longitude). Standard B-tree or sorted indexes cannot efficiently answer queries like "find all points within 5km of (12.9716, 77.5946)" because there's no natural way to order two-dimensional coordinates that preserves spatial proximity.

The Solution: Geohash and Divide-and-Conquer

Geohash elegantly solves this by converting two-dimensional coordinates into a single-dimensional string through a space-filling curve technique. The algorithm recursively divides the world into a grid using a divide-and-conquer approach. Starting with the entire globe (Left: 0, Right: 1, Top: 0, Bottom: 1), it splits the space in half both horizontally and vertically, zooming in with each iteration and assigning more bits to the resulting hash.

Rather than using binary strings (which aren't human-readable), Geohash uses base-32 encoding with characters like c, f, g, u, v, y, z representing different regions. A location in Bangalore might have a geohash of "tdz1v" where each character represents increasing precision. The key insight: nearby points share common geohash prefixes. If two locations have geohashes "tdz1vt" and "tdz1vx", they're guaranteed to be close because they share the prefix "tdz1v".

Storage and Querying

Modern geospatial databases like Redis (with geospatial commands), Elasticsearch and MongoDB provide built-in support for these operations. You simply insert or update latitude/longitude pairs with a query like:

SELECT * FROM people 
WHERE substr(geohash, 5) LIKE 'tdz1v%'

This finds all people in Bangalore whose geohash starts with "tdz1v". For even faster lookups, you can use Tries (prefix trees) for in-memory searches or leverage database indexes on geohash strings. The prefix-matching property makes range queries exceptionally efficient, checking if two points are close becomes as simple as comparing their geohash strings character by character.

Real-World Architecture: The Matching Algorithm

In production systems like ride-sharing apps, the architecture typically involves multiple components. A location ingestion service continuously receives GPS updates from drivers, storing their geohashes in Redis with asynchronous replication to followers for read scalability. When a rider requests a ride, the matcher service queries Redis master nodes to find nearby drivers, leveraging the geohash prefix to narrow the search space dramatically.

The system prioritizes availability and low latency over strict consistency, it's acceptable if a driver's location is slightly stale, as locations don't change drastically in seconds. This allows the system to handle 600,000 writes per minute and 200,000 reads per minute while remaining highly available and horizontally scalable by simply adding more Redis nodes.

Key Design Decisions

Redis serves as the primary datastore due to its in-memory speed, native geo-query support and multi-master, multi-replica setup for sharding data geographically. Data is sharded across masters, with each master handling specific regions (peak load areas mixed with low-traffic regions to balance load). This was initially done through manual allocation via an administrative service, as native Redis sharding lacks business context about traffic patterns.

Challenge: The EVAL Command and Read Replicas

An interesting engineering challenge arose with Redis's EVAL command, which is used to filter drivers based on constraints (vehicle type, rating, availability). Standard Redis replication doesn't execute EVAL commands on read replicas, they only execute on masters. Gojek's team raised this issue with the Redis maintainers and proposed the EVAL_RO (read-only EVAL) command, which they contributed back to the go-redis client library. This allowed constraint-based filtering to happen on replicas, dramatically improving read scalability.

Addressing Hot Shards

Query splitting addresses hot-shard problems to some extent. Instead of sending one large GEOSEARCH query to a single node, the system breaks queries into smaller geographic regions and fires them in parallel across multiple Redis nodes. Even if one node is slow, others respond quickly and partial results can be returned if SLA is breached. This provides fault tolerance and better latency distribution.

Broader Applications

The same divide-and-conquer strategy using geohashes powers anomaly detection systems through the Isolation Forest algorithm, which operates very similarly to geohash's recursive partitioning approach. This technique has become fundamental to modern spatial computing, enabling the location-aware applications that have transformed how we navigate, commute and discover the world around us.

User Affinity Service

Social media platforms fundamentally revolve around connections, the ability to follow others and build networks of relationships. Modeling these follower/following relationships at scale, however, presents significant architectural challenges that have forced platforms to rethink traditional database approaches.

The Problem: Graph Relationships at Scale

At first glance, a graph database seems like the natural choice for modeling social connections. Platforms like Koo (India's social network) initially considered this approach, storing users as nodes and follow relationships as directed edges. However, several practical concerns make graph databases less appealing for this use case: they require specialized expertise to operate, managing and scaling graph databases can be operationally painful, running them at scale becomes expensive and perhaps most critically, implementing efficient pagination over graph results is notoriously tricky.

The question becomes: can we achieve better performance and operational simplicity using a relational database instead?

Social graphs demand several key characteristics. The system must use a relational database for operational familiarity and cost-effectiveness. Pagination should be extremely efficient, performance must remain constant whether you're viewing page 1 or page 10,000 of someone's followers. Counting followers and following should be accurate and fast, enabling real-time display of these metrics on user profiles. The system must quickly answer "Does A follow B?" for rendering follow buttons correctly. Write operations must be fast to handle the high volume of follow/unfollow actions. Finally, the architecture must scale massively to support millions of users and billions of connections.

Why B+ Trees Power Relational Databases?

The efficiency of relational databases for this use case stems largely from their underlying index structure: B+ trees. Unlike binary search trees or hash tables, B+ trees are specifically optimized for disk-based storage systems. Each node in a B+ tree can contain dozens or hundreds of keys (not just two children), which dramatically reduces tree height, a tree with millions of records might only be 3-4 levels deep. This matters because each level traversal typically requires a disk read and minimizing disk I/O is critical for database performance.

B+ trees store all actual data in leaf nodes, which are linked together as a sorted linked list. This design makes range queries exceptionally efficient, finding all followers with IDs between 1000 and 5000 requires traversing to the first relevant leaf, then simply following the linked list until the range ends. Sequential scans are fast because leaf nodes are stored contiguously on disk. Hash indexes, by contrast, are excellent for exact-match lookups but completely fail at range queries or sorted retrievals.

For social graph queries that need sorting (ORDER BY position) or pagination (retrieving records 100-110), B+ trees provide exactly the right performance characteristics. The data is already sorted at the leaf level, making these operations nearly free compared to alternative data structures.

The Solution: Sharding on Source with State Columns

Rather than treating the social graph as a traditional graph, the winning approach stores it as a simple edges table with a clever twist. For each follow relationship, create two entries with an explicit state column indicating the relationship type.

When user A follows user B, the system creates two rows: one with src=A, dest=B, state=FOLLOWS and another with src=B, dest=A, state=FOLLOWED_BY. This bidirectional storage pattern may seem redundant, but it's the key to efficient queries.

The critical insight is sharding the dataset by the src column. This means all relationships where a particular user is the source live on the same database shard. With this design, two previously expensive queries become trivial:

Followers of B: SELECT * FROM edges WHERE src=B AND state='FOLLOWED_BY'
People B follows: SELECT * FROM edges WHERE src=B AND state='FOLLOWS'

Both queries are answered from a single shard with no cross-partition merging required. The src column serves as the natural partitioning key, ensuring each user's entire relationship graph resides on one node.

Twitter's FlockDB: Production-Scale Implementation

Twitter formalized this approach in FlockDB, their custom distributed graph storage system built on relational databases. FlockDB represents the social graph as a set of edges, where each edge has integer IDs for source and destination nodes, an int64 position field for cursor-based pagination and sorting and an int8 state field for relationship types like positive (active follow), negative (blocked), or archived (soft-deleted).

For fan-following graphs, the position field stores timestamps, enabling chronological sorting of followers. When an edge is deleted, the system changes the state to restoration-possible rather than physically removing the row, allowing for equality, range and sorting queries all served from a single primary index.

The schema uses a composite primary key of (source_id, state, position) with a unique index on (source_id, destination_id, state). Because data is partitioned by source node, any query about a particular user's connections requires no cross-partition execution, every relevant edge is already co-located on the same shard.

Pagination works through position-based cursors rather than limit/offset, meaning retrieving any page of results is equally fast regardless of how deep you paginate. This is a dramatic improvement over traditional approaches where LIMIT 10000 OFFSET 990000 would be prohibitively slow. The B+ tree index on (source_id, state, position) allows the database to jump directly to the correct position in the sorted sequence, retrieve exactly the needed records and move on, all with minimal disk I/O.

The architecture demonstrates that sometimes the best way to scale a graph isn't to use a graph database at all, it's to carefully denormalize relationships into a relational model optimized for your specific query patterns, leveraging the mature, battle-tested performance characteristics of B+ tree indexes that power traditional databases.

That's all for now folks. See you in the next blog!

Building Algorithmic Systems

Raycasting

Impressions Counting

The Core Problem

Solution: HyperLogLog with Cardinality Estimation

Remote File Sync

Challenge 1: Resumable Uploads and Downloads

Challenge 2: Change Notification and Synchronization

Multi-Version Support

Merkle Trees for Efficient Synchronization

GeoSpatial Search

The Problem: Finding Spatial Neighbors

The Challenge: Why Geo Proximity Is Hard

The Solution: Geohash and Divide-and-Conquer

Storage and Querying

Real-World Architecture: The Matching Algorithm

Key Design Decisions

Challenge: The EVAL Command and Read Replicas

Addressing Hot Shards

Broader Applications

User Affinity Service

The Problem: Graph Relationships at Scale

Why B+ Trees Power Relational Databases?

The Solution: Sharding on Source with State Columns

Twitter's FlockDB: Production-Scale Implementation

Comments

System Design

Designing online/offline indicator

More from this blog

System Design Deep Dives

Building High Throughput Systems

Building Storage Engines

Building Social Networks

Command Palette

Raycasting

Impressions Counting

The Core Problem

Solution: HyperLogLog with Cardinality Estimation

Remote File Sync

Challenge 1: Resumable Uploads and Downloads

Challenge 2: Change Notification and Synchronization

Multi-Version Support

Merkle Trees for Efficient Synchronization

GeoSpatial Search

The Problem: Finding Spatial Neighbors

The Challenge: Why Geo Proximity Is Hard

The Solution: Geohash and Divide-and-Conquer

Storage and Querying

Real-World Architecture: The Matching Algorithm

Key Design Decisions

Challenge: The EVAL Command and Read Replicas

Addressing Hot Shards

Broader Applications

User Affinity Service

The Problem: Graph Relationships at Scale

Requirements: What Makes Social Graphs Different

Why B+ Trees Power Relational Databases?

The Solution: Sharding on Source with State Columns

Twitter's FlockDB: Production-Scale Implementation

Comments

System Design

Designing online/offline indicator

More from this blog