Skip to main content

Command Palette

Search for a command to run...

Building Social Networks

Architectural Patterns & Design Decisions at Massive Scale

Published
29 min read
Building Social Networks

Modern social media platforms handle billions of interactions daily while maintaining sub-second response times globally. This blog explores six critical system design challenges: from CDN fundamentals and photo uploads at scale, to hashtag counters, dynamic image services, photo tagging and real-time messaging indicators, breaking down the architectural patterns and design decisions that make these systems work at massive scale.

How CDN Works

A Content Delivery Network (CDN) acts as a transparent intermediary layer between your users and your origin server. Understanding this flow is crucial to leveraging CDNs effectively.

The Basic Architecture

Let's walk through a concrete example. Suppose your origin server is hosted at https://rubansahoo.com, and you have a resource like a profile image at https://rubansahoo.com/dashboard/profile.jpeg. This could be any type of content: JSON responses, images, HTML pages, videos, or binary data.

When you set up a CDN, you're assigned a CDN domain, for instance https://a.mycdn.net. During configuration, you map this CDN domain to your origin server https://rubansahoo.com.

The Request Flow

Here's what happens when a user requests content through the CDN:

  1. User Request: The user's browser sends a request to https://a.mycdn.net/dashboard/profile.jpeg

  2. CDN Lookup: The CDN receives the request and identifies that a.mycdn.net is configured to use https://rubansahoo.com as its origin

  3. Cache Check: The CDN checks if it has a cached copy of /dashboard/profile.jpeg

  4. Origin Fetch (on cache miss): If the content isn't cached, the CDN forwards the request to https://rubansahoo.com/dashboard/profile.jpeg

  5. Cache and Serve: The CDN receives the response from the origin, stores it in its cache, and returns it to the user

  6. Subsequent Requests: Future requests for the same resource are served directly from the CDN's cache without touching the origin server

The Push vs. Pull Model

A common misconception about CDNs is that you need to actively "push" your content to them. In reality, CDNs typically operate on a pull model:

  • You don't upload files to the CDN

  • Instead, the CDN pulls content from your origin server on-demand when users first request it

  • Content remains cached until it expires or is invalidated

When you need to update content, you invalidate (or "purge") specific paths in the CDN cache. This marks those cached entries as stale. The next time a user requests that path, it triggers a cache miss, forcing the CDN to fetch the fresh version from your origin server. This new version is then cached and served to subsequent users.

This pull-based approach simplifies deployment workflows since you only need to update content on your origin server and then invalidate the relevant CDN cache entries, rather than managing separate upload processes to multiple CDN locations.

Designing and Implementing Photo Upload at Scale

Let's design a system capable of handling 5 million photo uploads per day, exploring the architecture decisions, tradeoffs, and optimisations needed for a production-grade photo platform.

Requirements

  • Handle 5M photo uploads per day

  • Ensure efficient storage and retrieval

  • Maintain privacy and security

  • Optimise for different devices and network conditions

  • Design for extensibility

Key Design Considerations

When building a photo upload system at this scale, we need to think about storage infrastructure, data flow architecture, separation of concerns, privacy mechanisms, and performance optimisations.

1. The Fundamental Architecture

Storage Choice: S3/GCS

For storing images, object storage services like AWS S3 or Google Cloud Storage are the natural choice. They provide:

  • Virtually unlimited scalability

  • Built-in redundancy and durability

  • Cost-effective storage for large binary objects

  • Simple HTTP-based access patterns

The Bandwidth Problem

A common mistake is routing photo uploads through your API servers. Here's why this is problematic:

When a user uploads a 5MB photo through your API server, which then forwards it to S3, you're consuming bandwidth twice:

  1. User → API Server (ingress)

  2. API Server → S3 (egress)

This doubles your network costs and creates an unnecessary bottleneck at your API layer. Your application servers should handle business logic, not act as data proxies.

Metadata Storage: Sharded Relational Database

While images live in object storage, metadata (post information, user associations, captions) belongs in a relational database, sharded by user ID for horizontal scalability.

Serving Images Efficiently

Images should be served directly from S3 through a CDN layer, which provides:

  • Geographic distribution and edge caching

  • Reduced latency for users worldwide

  • Built-in DDoS protection and security

  • Bandwidth cost optimisation

2. Uploading Photos with Pre-Signed URLs

The solution to the bandwidth problem is pre-signed URLs, which allow direct client-to-storage uploads.

Here's the flow:

  1. User requests upload: User A calls the Image Service to initiate an upload

  2. Generate unique ID: Image Service generates a random image ID (e.g., 1729_4275)

  3. Create pre-signed URL: Image Service requests S3 to create a temporary signed URL for the path:

   s3://insta-photos/<user_id>/<random_photo_id>.jpg
  1. Direct upload: User A uploads the photo directly to S3 using the signed URL

  2. Photo stored: The file is now stored at the designated S3 path

This approach eliminates the API server from the upload path entirely, cutting bandwidth consumption in half.

3. Publishing Photos

After uploading, the user creates a post entry:

The user retains the random image_id received from the Image Service and uses it when creating a post through the Posts Service.

Posts Table Schema:

id          → generated from ID service
user_id     → author of the post
image_id    → random image ID from Image Service
caption     → post caption
created_at  → timestamp

Why Store Only the Image ID?

Instead of storing the full CDN URL like https://instacdn.net/A/1729_4275.jpg, we store just the image_id. The complete S3 path can be computed at runtime:

s3://insta-photos/<user_id>/<image_id>.jpg

This approach stores the essential data rather than derived data, providing flexibility to change storage patterns or CDN configurations without database migrations.

Bonus: The Posts Service can validate that the image_id belongs to the requesting user_id, preventing users from posting photos they didn't upload.

This pattern is used by platforms like LinkedIn and many CDN providers.

4. Privacy Considerations

Are Private Photos Really Private?

When using CDNs, the answer is nuanced. Instagram's approach uses time-limited signed URLs:

https://instacdn.net/ul/1729_4275.jpg?ig_cache_key=...&oh=...&oe=...&nc_sid=...

How it works:

  1. When rendering private photos, the backend generates a URL with cryptographic signatures

  2. The signature includes an expiration timestamp

  3. When the browser requests the image via an <img> tag, the CDN validates:

    • Signature authenticity

    • Expiration time

  4. If valid and not expired, the CDN serves the image; otherwise, it returns an error

The limitation: These URLs are short-lived (typically minutes to hours), but during that window, anyone with the URL can access the photo. This is pseudo-privacy, not true privacy.

Achieving True Privacy

For genuine private photo platforms, you cannot use CDN caching. Instead:

  • All requests must route through your API servers

  • Every request must check user permissions against your database

  • This significantly increases infrastructure costs and complexity

  • Most platforms accept pseudo-privacy as a reasonable tradeoff

5. Overall Data Flow

Let's trace a complete journey:

Upload Phase:

  1. User A initiates upload via Image Service

  2. Image Service returns pre-signed URL and random image_id

  3. User A uploads photo directly to S3

Post Creation:

  1. User A creates post via Posts Service with image_id and caption

  2. Posts Service creates entry in posts table

Consumption Phase:

  1. User B requests User A's profile: GET /users/A/posts

  2. Posts Service returns:

   [
     {
       "id": "...",
       "user_id": "A",
       "caption": "...",
       "url": "https://instacdn.net/A/1234.png"
     }
   ]
  1. Frontend renders images using <img> tags

  2. Browser fetches photos from CDN

  3. CDN serves cached (or freshly fetched) images

6. Image Optimisation at Scale

Users access your platform from vastly different contexts:

  • Geographic locations (different network latencies)

  • Network conditions (5G vs. 3G vs. WiFi)

  • Device capabilities (flagship phones vs. budget devices)

  • Screen sizes (tablets vs. phones)

Sending a 5MB high-resolution photo to every user is wasteful and creates poor user experience for those on slower connections.

Dynamic Image Transformation

Instead of pre-generating multiple resolutions, modern CDNs offer on-the-fly image transformation:

https://instacdn.net/ul/1729_4275.jpg?w=360

How it works:

  1. CDN receives request with transformation parameters (w=360 for 360px width)

  2. If transformed version is cached, serve it immediately

  3. If not cached:

    • Fetch original from S3

    • Apply transformation (resize to 360px width)

    • Cache the transformed version

    • Serve to user

  4. Subsequent requests for the same transformation are served from cache

This approach provides:

  • No upfront processing costs

  • Infinite flexibility in dimensions

  • Automatic caching of popular sizes

  • Reduced storage overhead (only cache what's actually requested)

Why Not Build Your Own?

While you could build an image optimisation service, CDNs provide this functionality out of the box with:

  • Global edge locations

  • Battle-tested transformation engines

  • Automatic format optimisation (WebP, AVIF)

  • Responsive image support

This is a clear case where leveraging existing infrastructure beats building custom solutions.

Conclusion

Building a photo upload system at scale requires careful consideration of bandwidth costs, storage patterns, privacy tradeoffs, and user experience optimisation. The pre-signed URL pattern, metadata-driven URL generation, and CDN-based transformations form the foundation of modern photo platforms serving billions of images daily.

Designing a Concurrent Hashtag Counter at Scale

Building a hashtag service that handles millions of hashtags and provides real-time counts is a fascinating distributed systems problem. Let's design a system that delivers the best user experience while handling massive write volumes and maintaining fast read response times.

Requirements

  • Support millions of hashtags

  • Handle high-volume concurrent updates (posts with hashtags)

  • Provide super-fast response times for read queries

  • Track popular photos per hashtag

  • Maintain accurate counts despite distributed writes

Key Design Challenges

When building a hashtag counter at scale, we face several critical challenges:

  • Storage: Where and how to store hashtag counts and metadata

  • Counting at volume: Processing millions of hashtag updates efficiently

  • Inter-service communication: How Post Service communicates with Hashtag Service

  • Atomicity: Ensuring counts remain accurate under concurrent updates

  • Partial updates: Handling failures gracefully without losing data

  • Read/Write optimisation: Balancing fast reads with high write throughput

Architecture Overview

The system consists of several key components working together:

  1. Event-Driven Architecture with Kafka

    Kafka acts as the glue connecting our services. When the Post Service processes a new photo upload, it doesn't directly call the Hashtag Service. Instead:

    1. Post Service publishes events to a Kafka topic (e.g., post-created)

    2. Hashtag Extraction service consumes these events and extracts hashtags from captions

    3. Extracted hashtags are published to another Kafka topic: post-hashtag

    4. Multiple downstream consumers process these events

Why Kafka?

  1. Decoupling: Post Service and Hashtag Service don't need to know about each other

  2. Durability: Events are persisted, allowing replay if processors fail

  3. Scalability: Multiple consumers can process events in parallel

  4. Back-pressure handling: Services consume at their own pace

  1. The Adapter Pattern for Service Integration

    The Adapter Pattern is crucial here. The Hashtag Extraction service acts as an adapter between:

    • Upstream: Post Service (which knows about posts and photos)

    • Downstream: Hashtag Service (which only cares about hashtag events)

This abstraction means:

  • Post Service doesn't need hashtag-specific logic

  • Hashtag Service can be updated independently

  • Additional consumers can tap into the same event stream

  1. Counting Servers: Effective Batching and Counting

    The naive approach of incrementing a database counter for every single hashtag mention would overwhelm the database. Instead, we use Counting Servers that batch updates:

    How it works:

    1. Consume events: Counting Servers read from the post-hashtag Kafka topic

    2. In-memory aggregation: Maintain in-memory counters for hashtags over a time window (e.g., 10 seconds)

    3. Batch flush: Periodically flush aggregated counts to the database

Example:

    Incoming events (over 10 seconds):
    - #sunset: 500 mentions
    - #travel: 300 mentions
    - #photography: 450 mentions

    Batch update to DB:
    UPDATE hashtags SET count = count + 500 WHERE tag = 'sunset';
    UPDATE hashtags SET count = count + 300 WHERE tag = 'travel';
    UPDATE hashtags SET count = count + 450 WHERE tag = 'photography';

This reduces database writes from potentially millions per second to thousands per second, dramatically improving throughput.

Note on throughput: The system note "Not high throughput" refers to the fact that while Counting Servers handle batching well, they don't need extreme throughput capabilities because Kafka handles the back-pressure and buffering upstream. The batching strategy makes the write load manageable.

  1. Popularity Service: Tracking Top Photos

    The Popularity Service is responsible for identifying the most engaging photos for each hashtag. It:

    1. Consumes events from their own Kafka topic (separate from hashtag counting)

    2. Uses its own logic to determine "top photos" (likes, comments, engagement, recency)

    3. Publishes results indicating which photos are trending for specific hashtags

    4. These results are consumed by Workers who update the system

This separation of concerns means:

  • Counting and ranking are independent operations

  • Different teams can optimise each service separately

  • The algorithm for "popularity" can evolve without affecting counting

  1. Workers: Maintaining Hashtag Metadata

    Workers are responsible for updating the broader hashtag metadata beyond just counts:

    • Top photos for each hashtag (from Popularity Service)

    • Related hashtags (computed periodically)

    • Trending status flags

    • Metadata updates (description, category)

Workers consume from multiple sources and update the partitioned database that backs the Hashtag API.

  1. Storage Strategy: Partitioned Databases

    The system uses partitioned databases for different data access patterns:

    Primary Database (for counts):

    • Stores hashtag counts and core metadata

    • Partitioned by hashtag (e.g., hash-based sharding)

    • Optimized for writes (from Counting Servers)

Secondary Database (for reads):

  • Denormalised data optimised for API responses

  • May include cached top photos, related tags

  • Partitioned for query patterns (e.g., by popularity tier)

Cache Layer:

  • Redis/Memcached sits in front of databases

  • Caches popular hashtag data (top 1000 hashtags, trending tags)

  • TTL-based expiration with background refresh

  • Dramatically reduces database load for read queries

Read and Write Path Optimisations

Write Path (Ingestion):

  1. Kafka buffering: Absorbs traffic spikes, provides durability

  2. Batch processing: Counting Servers aggregate before DB writes

  3. Asynchronous updates: No synchronous blocking operations

  4. Partitioned writes: Each Counting Server handles a subset of hashtags

Read Path (Queries):

  1. CDN for API responses: Cache GET /hashtag/<tag> responses at edge

  2. Application cache: Redis stores hot hashtag data

  3. Database replicas: Read queries hit read replicas, not primary

  4. Denormalised data: Avoid joins; data is pre-aggregated

Result: Users see hashtag counts and top photos load in <100ms globally.

Handling the GET /hashtag/<tag> Request

When a user queries GET /hashtag/sunset:

  1. CDN check: If response is cached at edge, return immediately (~10-50ms)

  2. Load Balancer: Routes to Hashtag API server

  3. Cache check: API checks Redis for hashtag data

  4. Database query (on cache miss): Fetch from partitioned DB

  5. Response construction: Return count, top photos, related tags

  6. Cache population: Store result in Redis and CDN

System Properties

Consistency Model

  • Eventual consistency: Counts may lag by seconds (acceptable for social media)

  • Strong consistency where needed: User's own posts immediately reflect in their view

Fault Tolerance

  • Kafka replication: No event loss

  • Consumer groups: Automatic partition rebalancing if a Counting Server fails

  • Idempotent processing: Duplicate events don't double-count (use event IDs)

Scalability

  • Horizontal scaling: Add more Counting Servers or Workers as needed

  • Kafka partitions: Increase parallelism by adding partitions

  • Database sharding: Distribute hashtags across shards

Key Takeaways

  1. Kafka as a Glue

    Kafka decouples services, provides durability, and enables multiple consumers to process the same event stream for different purposes (counting, popularity tracking, analytics).

  2. Adapter Pattern

    The Hashtag Extraction service adapts Post Service events into a format suitable for downstream consumers, enabling loose coupling and independent evolution.

  3. Effective Batching and Counting

    Counting Servers aggregate updates in memory before writing to the database, reducing write load from millions to thousands of operations per second.

  4. Read and Write Path Optimisations

    • Write path: Kafka buffering, batch processing, asynchronous updates

    • Read path: Multi-layer caching (CDN, Redis, DB replicas), denormalized data

This architecture handles the demanding requirements of a hashtag service at social media scale while maintaining sub-100ms response times for users worldwide.

Real-World Considerations

Monitoring and Observability

  • Track Kafka consumer lag (are Counting Servers keeping up?)

  • Monitor cache hit rates (is Redis serving most reads?)

  • Alert on count anomalies (sudden spikes may indicate spam)

Data Quality

  • Implement spam detection for hashtag abuse

  • Normalize hashtags (#Sunset, #sunset, #SUNSET → #sunset)

  • Filter inappropriate hashtags before indexing

Cost Optimisation

  • Archive old hashtag-post associations to cheaper storage

  • Use tiered caching (hot tags in memory, warm tags in Redis, cold tags in DB)

  • Compress Kafka messages for network efficiency

This design demonstrates how careful architectural choices around messaging, batching, caching, and data partitioning enable building services that handle billions of events while delivering exceptional user experience.

Designing Gravatar and Dynamic OG Images

Let's explore how to build systems that serve images dynamically, from simple profile pictures to sophisticated social media preview cards. We'll design a Gravatar-like service and understand how platforms like GitHub generate Open Graph images on demand.

Understanding Image Serving Fundamentals

Before diving into Gravatar, let's understand how web servers serve images.

Traditional Static File Serving

Consider a typical website structure:

site/
├── static/
│   ├── img/
│   │   ├── ruban.jpg
│   │   └── logo.jpg
│   ├── js/
│   └── css/

When a user requests https://rubansahoo.com/static/img/ruban.jpg:

  1. Server receives the request

  2. Parses the URL path /static/img/ruban.jpg

  3. Reads the file from disk at that path

  4. Sends the file bytes in the HTTP response

  5. Browser renders the bytes as an image

This works well for small sites, but has limitations at scale.

Proxying S3 for Image Storage

Instead of serving from disk, we can serve from S3 while maintaining the same URL structure:

@app.route('/raw/<path>')
def raw_handler(path):
    raw_bytes = s3.read(path, BUCKET)
    return raw_bytes

Now when a user requests http://localhost:5000/raw/users/ruban.jpg:

  1. Server receives the request with path /raw/users/ruban.jpg

  2. Extracts the S3 key: users/ruban.jpg

  3. Fetches the file from S3 bucket

  4. Returns the bytes directly to the client

We've built an S3 proxy! This pattern provides:

  • Unlimited storage (S3 scales infinitely)

  • No disk space concerns on application servers

  • Centralised image management

  • Easy backup and replication

Real-World Example: GitHub's Open Graph Images

GitHub demonstrates the power of dynamic image generation brilliantly. When you share a repository link on social media, GitHub generates a custom Open Graph (OG) image on-the-fly that includes:

  • Repository name

  • Repository description

  • Star count

  • Fork count

  • Language badges

  • Owner avatar

These aren't pre-generated static images, they're created on demand based on the current repository metadata. This ensures social previews are always up-to-date without storing millions of images.

Designing Gravatar: A Profile Picture Service

Gravatar provides a universal profile picture system: one URL that works across the entire web.

What is Gravatar?

Gravatar gives you a single embeddable URL for your profile picture:

<img src="https://gravatar.com/0eafd172">

The URL uses a hash of your email instead of the email itself:

hash("ruban1work@gmail.com") = 0eafd172

Why hash the email? Privacy! The hash prevents exposing Personal Identifiable Information (PII) while still providing a unique, stable identifier.

Requirements

  • Users can upload multiple photos

  • Users can mark one photo as active

  • The active photo should be returned when requesting the user's Gravatar URL

  • Fast, globally distributed delivery

Database Schema

Users Table:

id          | email                  | hash      | active_photo_id
------------|------------------------|-----------|----------------
729         | ruban1work@gmail.com   | 0eafd172  | 7abe

Photos Table:

id          | user_id
------------|--------
7abe        | 729
8abe        | 729
cdae        | 729
e7215       | 729

Key design decisions:

  • hash is indexed for fast lookups

  • active_photo_id is a foreign key to the photos table

  • Users can have multiple photos but only one is active

Architecture Flow

Let's trace through the complete lifecycle of a Gravatar photo.

  1. Photo Upload Flow

    Step 1: Prepare for Upload

    User requests permission to upload a new photo:

     POST /upload/prepare
    

    Photo Upload Service:

    1. Generates a random photo ID (e.g., 8abe)

    2. Creates a pre-signed S3 URL for path:

       s3://gravatar-images/{user_id}/{random_photo_id}s3://gravatar-images/729/8abe
      
    3. Returns the signed URL to the user

Step 2: Direct Upload to S3

User uploads the photo directly to S3 using the pre-signed URL (no bandwidth through API servers).

Step 3: Register Photo in Gravatar

User makes a POST request to register the uploaded photo:

    POST /photos
    {
      "photo_id": "8abe",
      "user_id": 729
    }

This creates an entry in the photos table.

  1. Marking a Photo as Active

    To change which photo appears as the Gravatar:

     UPDATE users
     SET active_photo_id = '8abe'
     WHERE hash = '0eafd172';
    

    This single update changes which photo will be served globally.

  2. Serving the Active Photo

    When someone embeds your Gravatar in their website:

     <img src="https://gravatar.com/0eafd172">
    

    But wait! We want gravatar.com to serve images, yet our API is at api.gravatar.com. How do we bridge this?

Introducing CDN as the Public Interface

CDN Configuration:

gravatar.com → ORIGIN: api.gravatar.com/photos

Now the flow becomes:

Request: https://gravatar.com/0eafd172

CDN forwards to: https://api.gravatar.com/photos/0eafd172

API Server Logic:

@app.route('/photos/<hash>')
def get_gravatar(hash):
    # Get active photo ID
    result = db.query(
            """
            SELECT active_photo_id, id
            FROM users
            WHERE hash = ?
            """, 
            hash
        )

    user_id = result['id']
    photo_id = result['active_photo_id']

    # Construct S3 path
    s3_path = f"{user_id}/{photo_id}"

    # Fetch from S3
    image_bytes = s3.read(s3_path, bucket="gravatar-images")

    return image_bytes

Response flows back through CDN to user, and the CDN caches it for future requests.

Cache Invalidation: Keeping Photos Fresh

When a user marks a new photo as active, we need to invalidate the CDN cache so users see the updated photo immediately.

Asynchronous Invalidation Flow:

  1. User marks photo cdae as active

  2. API server updates the database

  3. API server publishes event to message broker (Kafka/RabbitMQ):

     {  
         "event": "photo_updated",
         "user_hash": "0eafd172",
         "new_photo_id": "cdae"
     }
    
  4. Worker consumes the event

  5. Worker calls CDN API to invalidate:

     PURGE https://gravatar.com/0eafd172
    
  6. Next request fetches fresh data from origin

This approach:

  • Doesn't block the user's request

  • Ensures eventual consistency

  • Handles CDN invalidation failures gracefully (can retry)

On-Demand Image Optimisation

Real-world Gravatar usage varies wildly:

  • Email clients might need 32×32px thumbnails

  • Profile pages might display 256×256px

  • High-DPI displays might request 512×512px

Storing pre-generated versions for every possible size is impractical. Instead, we use URL-driven transformations.

URL-Driven Transformations

<img src="https://gravatar.com/0eafd172?w=32">

The ?w=32 query parameter tells the CDN: "I want this image at 32px width."

How CDNs Handle Image Transformations

Modern CDNs (like Cloudflare, Fastly, Cloudinary) provide this feature out-of-the-box:

  1. Request received: https://gravatar.com/0eafd172?w=240

  2. Cache check: Is the 240px version cached?

    • If yes: Return immediately

    • If no: Continue to step 3

  3. Fetch original: Request https://api.gravatar.com/photos/0eafd172 from origin

  4. Transform: Resize image to 240px width (maintaining aspect ratio)

  5. Cache transformed version: Store the 240px variant

  6. Return response: Send transformed image to user

Subsequent requests for the same transformation are served directly from cache.

Image Transformation Characteristics

CPU Intensive:

  • Resizing a 1024×1024 image to 32×32 requires significant processing

  • Cannot be done asynchronously (user is waiting for the response)

  • Requires powerful servers with sufficient CPU

Scaling Considerations:

  • Need large server instances to handle transformation workload

  • Need many servers to handle concurrent transformation requests

  • First request for a new size/image combination is slower (cache miss)

Popular Image Processing Libraries:

  • ImageMagick: Industry-standard command-line tool

  • libvips: Faster, more memory-efficient alternative

  • Sharp (Node.js): High-performance wrapper around libvips

  • Pillow (Python): Popular library for image manipulation

Why Use CDN Image Optimisation?

Building your own image transformation service is possible but complex:

  • Need to handle multiple image formats (JPEG, PNG, WebP, AVIF)

  • Need to implement caching strategies

  • Need to manage geographic distribution

  • Need to handle security (preventing resource exhaustion attacks)

CDNs provide this functionality out-of-the-box with:

  • Global edge locations

  • Battle-tested transformation engines

  • Automatic format optimisation

  • Security and rate limiting

The Complete Gravatar System

Putting it all together:

Upload Path:

User → Photo Upload Service (pre-signed URL) → S3
     → API (register photo) → Database

Serving Path:

Browser → CDN (gravatar.com)
       → API (api.gravatar.com/photos) → Database (get active photo)
                                       → S3 (fetch image)
       ← CDN (cache and transform) ← API
       ← Browser

Update Path:

User → API (mark photo active) → Database
     → Message Broker → Worker → CDN (invalidate cache)

Key Architectural Benefits

  1. Speed: CDN edge caching provides sub-100ms response times globally

  2. Scale: S3 and CDN handle storage and delivery at any scale

  3. Flexibility: URL-driven transformations support any image size

  4. Efficiency: Pre-signed URLs eliminate bandwidth through API servers

  5. Security: Email hashing protects user privacy

  6. Simplicity: Users manage photos through a simple API

This architecture demonstrates how combining object storage, CDN capabilities, and smart caching strategies creates a robust, globally scalable image service that powers millions of websites.

Designing Photo Tagging: The Product Manager Hat

Building a photo tagging feature requires more than just writing code. A senior engineer approaches system design by wearing three distinct hats throughout the process.

The Three Hats of a Senior Engineer

  1. Product Manager: Understanding user needs, defining requirements, asking the right questions

  2. Tech Architect: Designing scalable systems, choosing technologies, planning for extensibility

  3. Software Engineer: Implementing robust, maintainable code

For the photo tagging feature, let's focus on the first and most critical step: wearing the Product Manager hat.

Wearing the Product Manager Hat: Asking the Right Questions

Before writing any code or designing schemas, a senior engineer asks critical questions to stakeholders and senior management. The answers to these questions fundamentally shape the technical design.

  1. Authorisation: Who Can Tag?

    Key questions:

    • Can anyone tag anyone in any photo?

    • Can only the photo owner create tags?

    • Do tagged users need to approve tags before they appear publicly?

    • Can users prevent themselves from being tagged by certain people?

Why it matters: This defines your authorisation model, privacy settings architecture, and approval workflow complexity.

  1. Limits: Maximum Tags Per Photo

    Key questions:

    • What's the maximum number of people that can be tagged in a single photo?

    • Are there rate limits for tagging operations to prevent spam?

Why it matters: Impacts database design, UI rendering performance, and spam prevention strategies. Typical answer: 20 tags per photo.

  1. Notifications and Throttling

    Key questions:

    • Should users be notified immediately when tagged?

    • Should we batch notifications (tagged in 5 photos → 1 notification)?

    • How do we handle notification spam for popular users (celebrities tagged in 1000 photos)?

Why it matters: Determines notification service integration complexity, throttling mechanisms, and user experience design. Poor notification strategy leads to user fatigue and feature abandonment.

  1. Self-Removal: Can Users Untag Themselves?

    Key questions:

    • Can tagged users remove themselves from photos?

    • Should the photo owner be notified when someone removes their tag?

    • Can users block being tagged by specific people?

Why it matters: Critical for user privacy controls and social dynamics. Most platforms allow self-removal without notifying the photo owner to avoid social friction.

  1. Face Recognition and Tag Suggestions

    Key questions:

    • Should we use ML to suggest tags based on face recognition?

    • What's the expected latency/SLA from the ML team (1 minute? 1 hour?)?

    • How do we handle false positives and user trust?

Why it matters: Determines ML service integration complexity, performance expectations, and privacy concerns. Typical answer: Asynchronous processing with 1-5 minute SLA, suggestions shown to photo owner for manual confirmation.

  1. Profile and Activity Integration

    Key questions:

    • Should tagged photos appear on user profiles?

    • Should there be a "Photos of You" section?

    • How frequently will users query "all photos I'm tagged in"?

Why it matters: Query patterns dictate database indexing strategy. If "Photos of You" is a primary use case, you need efficient indexes on user_id, not just post_id.

  1. Feed Integration

    Key questions:

    • Should tagging trigger feed updates?

    • Should tagged users' followers see the photo in their feed?

    • How does this interact with privacy settings (private photos, blocked users)?

Why it matters: Impacts feed generation algorithm complexity, privacy model enforcement, and determines whether you need event-driven architecture for extensibility.

Why These Questions Matter

Each answer shapes critical technical decisions:

  • Authorisation questions → RBAC service integration, approval workflow design

  • Limits questions → Database constraints, validation logic, spam prevention

  • Notifications questions → Message broker (Kafka) for batching, throttling algorithms

  • Self-removal questions → Database schema (status column: pending/approved/removed)

  • Face recognition questions → Async ML service integration, relative positioning storage

  • Profile questions → Multiple database indexes, query optimisation strategy

  • Feed questions → Event-driven architecture with Kafka, multiple service consumers

The Senior Engineer's Mindset

A junior engineer might start with "Let's create a post_tags table with post_id and user_id."

A senior engineer starts with "What problem are we actually solving for users, and what are the business constraints?" The technical design naturally follows from understanding requirements deeply.

By asking these questions upfront, you avoid costly redesigns later when stakeholders say "Oh, we also need approval workflows" or "Users should be able to remove tags themselves."

Next steps: With requirements clarified, you'd move to wearing the Tech Architect hat (designing the system with Kafka, relative positioning, service architecture) and finally the Software Engineer hat (implementation details). But it all starts with asking the right questions.

Designing a Newly Unread Message Indicator

When building a messaging platform, one of the most critical UX features is the "unread message indicator", that little badge showing users they have new messages. But there's a subtle distinction that significantly impacts system design: newly unread vs. total unread messages.

Understanding the Requirement

The Problem: We need to inform users about the presence of new messages they haven't seen yet, not just messages they haven't acknowledged.

Key Insight: A user might have 100 unread messages, but they're only from 3 different people who recently sent messages. What matters to the user is: "How many people have sent me messages I haven't seen yet?"

This is fundamentally different from counting total unread messages.

Example Scenario

User B has:
- 45 unread messages from User A
- 30 unread messages from User C  
- 25 unread messages from User D

Newly Unread Count = 3 (not 100!)

The badge should show "3", indicating 3 different people have sent messages, not 100 individual messages.

Requirements

  • Near real-time updates: Badge must update within seconds of receiving a new message

  • Accurate count: Track unique senders, not total messages

  • High availability: This is a critical user-facing feature

  • Scalable: Must handle millions of concurrent users

The Core Formula

# newly unread = # unique users from whom messages are received and unread

System Design

Problem 1: When Does a Message Become "Unread"?

A message is "newly unread" when it's not delivered to the recipient. But how do we know if a message isn't delivered?

Solution: WebSocket Connection Status

The messaging service uses WebSockets (WS) for real-time message delivery. WebSockets provide a crucial piece of information: whether a user is currently connected.

User A connected via WebSocket → Messages delivered in real-time
User B not connected (offline) → Messages are "undelivered"

When the messaging service attempts to send a message but finds the recipient offline, it publishes an ON_MSG_UNSENT event.

Architecture: Event-Driven Design

User A (sender) ──WS──> Messaging Service ──> Partitioned Chat DB
                              │
                              │ (User B offline?)
                              │
                              ▼
                    ON_MSG_UNSENT event
                              │
                              ▼
                      Offline Service

Event Structure:

{
  "event": "ON_MSG_UNSENT",
  "data": {
    "src": "A",     // sender
    "dest": "B",    // recipient (offline)
    "msg": "...",
    "msg_id": "12345",
    "timestamp": "2025-01-15T10:30:00Z"
  }
}

The Key Insight: Unique Sender Tracking

We need to track: Which unique users have sent unread messages to User B?

This is a set membership problem, not a counting problem. The natural solution is Redis Sets.

Architecture: Read and Write Paths

Write Path: Updating Newly Unread Count

When ON_MSG_UNSENT event is published:

  1. Offline Service (workers) consumes the event

  2. Workers update Auxiliary Redis with the unique sender

  3. Status Update Workers handle database persistence asynchronously

Redis Operation:

SADD user:B:unread_from "A"
# Returns: 1 (if A wasn't already in the set)
# Returns: 0 (if A was already in the set)

The Redis SET automatically handles uniqueness, adding User A multiple times still results in a set size of 1.

Get the count:

SCARD user:B:unread_from
# Returns: 3 (if A, C, and D have sent unread messages)

Read Path: Displaying the Badge

When User B's app needs to display the badge:

API Call:

GET /api/users/B/status

Status Check API Flow:

User B → Status Check API → Redis Cluster
                           → Partitioned Chat DB (fallback)

The API can batch requests using Redis pipeline:

PIPELINE
SCARD user:B:unread_from
GET user:B:online_status
GET user:B:last_seen
EXEC

Response:

{
  "newly_unread_count": 3,
  "online_status": "offline",
  "last_seen": "2025-01-15T10:30:00Z"
}

Alternatively, User B's app can call a simpler endpoint:

GET /api/users/B/clean_status

This returns a lean response focused just on the unread count.

The Auxiliary Database Pattern

High-Level Pattern:

Whenever you have a brittle component in your infrastructure (typically your database), and if it's doing a lot of unnecessary operations leading to no state changes, try to add an auxiliary database and reduce the load on the primary database. This way the service will be up and running, and everything will be happy.

Why This Matters Here:

Without Auxiliary Redis:

  • Every ON_MSG_UNSENT event would query the chat database

  • "Does User B already have unread messages from User A?"

  • If yes, no database update needed (wasted query)

  • Under high load (millions of messages/second), this kills the database

With Auxiliary Redis:

  • Redis SET automatically handles deduplication

  • SADD operations are O(1) and incredibly fast

  • Only meaningful state changes are written to the main database

  • Main database handles persistent storage asynchronously

  • Redis acts as a high-speed buffer for real-time operations

Data Consistency: Redis and Database

Dual Storage Strategy:

  1. Redis (Auxiliary): Source of truth for real-time reads

    • Fast reads (sub-millisecond)

    • Handles deduplication automatically

    • In-memory, so may need persistence (AOF/RDB)

  2. Partitioned Chat DB: Source of truth for persistence

    • Stores message delivery status

    • Enables historical queries

    • Recovery mechanism if Redis fails

Consistency Model:

  • Redis updated immediately (strong consistency for reads)

  • Database updated asynchronously via workers (eventual consistency)

  • B-C: True, B-D: True flags indicate whether certain relationships exist in auxiliary Redis

Clearing the Badge: When Messages Are Read

When User B comes online and reads messages from User A:

WebSocket Event:

{
  "event": "MESSAGES_READ",
  "data": {
    "user_id": "B",
    "sender_id": "A",
    "read_up_to_msg_id": "12350"
  }
}

Redis Operation:

SREM user:B:unread_from "A"
# Removes A from the set

Updated count:

SCARD user:B:unread_from
# Returns: 2 (now only C and D remain)

The badge updates in near real-time via WebSocket push to User B's client.

Scalability Considerations

Redis Cluster Sharding

user:A:unread_from → Redis Node 1
user:B:unread_from → Redis Node 2
user:C:unread_from → Redis Node 3

Hash-based sharding distributes load across Redis cluster nodes.

Database Partitioning

The main chat database is partitioned (likely by user_id or conversation_id) to handle write throughput.

Worker Scaling

Status Update Workers can be scaled horizontally:

  • Consume from partitioned Kafka topics

  • Each worker handles a subset of users

  • Idempotent operations (safe to retry)

Notification System Integration

The Notification System consumes the same ON_MSG_UNSENT events to send push notifications, email alerts, etc. This demonstrates the power of event-driven architecture: one event, multiple consumers, each handling their specific concern.

Edge Cases and Handling

  1. User Comes Online Mid-Event Processing

    Scenario: User B comes online while offline service is processing unread messages.

    Solution:

    • Check WebSocket connection status before adding to Redis

    • Race conditions are acceptable (eventual consistency)

    • Worst case: Badge shows briefly, then updates when messages are marked read

  2. Redis Failure

    Fallback:

    • API falls back to querying the partitioned chat database

    • Slower but ensures availability

    • Redis recovery rebuilds state from database

  3. Multiple Devices

    Scenario: User B reads messages on phone, but desktop still shows unread badge.

    Solution:

    • WebSocket broadcasts MESSAGES_READ event to all User B's connected devices

    • Each device updates its local badge immediately

    • Eventual consistency across devices

Performance Metrics

Expected Performance:

  • Write latency: <10ms (Redis SADD operation)

  • Read latency: <5ms (Redis SCARD operation)

  • Badge update latency: <500ms end-to-end (event → Redis → WebSocket push)

  • Throughput: Millions of messages/second with Redis Cluster

Key Takeaways

  1. Auxiliary Database Pattern: Use Redis to shield your main database from high-frequency operations with low state-change ratios

  2. SET Data Structure: Redis SETs are perfect for tracking unique senders (automatic deduplication)

  3. Event-Driven Architecture: ON_MSG_UNSENT event enables multiple consumers (unread tracking, notifications, analytics)

  4. Separation of Concerns:

    • Messaging Service: Delivers messages

    • Offline Service: Tracks unread state

    • Status Check API: Serves badge counts

    • Status Update Workers: Persists to main database

  5. Near Real-Time UX: WebSocket + Redis enables sub-second badge updates globally

This design demonstrates how choosing the right data structure (Redis SET) and architectural pattern (auxiliary database) can turn a complex problem into an elegant, scalable solution.

That's all for now folks. See you in the next blog!

System Design

Part 5 of 9

In this series, we'll start learning the system design fundamental concepts that will help us become better software engineers. We'll understand real life case studies where certain decisions are more optimal than others, given the specific context.

Up next

Building Storage Engines

Modern Speed & Scalability. Battle-Tested Patterns.