Building Social Networks

Modern social media platforms handle billions of interactions daily while maintaining sub-second response times globally. This blog explores six critical system design challenges: from CDN fundamentals and photo uploads at scale, to hashtag counters, dynamic image services, photo tagging and real-time messaging indicators, breaking down the architectural patterns and design decisions that make these systems work at massive scale.

How CDN Works

A Content Delivery Network (CDN) acts as a transparent intermediary layer between your users and your origin server. Understanding this flow is crucial to leveraging CDNs effectively.

The Basic Architecture

Let's walk through a concrete example. Suppose your origin server is hosted at https://rubansahoo.com, and you have a resource like a profile image at https://rubansahoo.com/dashboard/profile.jpeg. This could be any type of content: JSON responses, images, HTML pages, videos, or binary data.

When you set up a CDN, you're assigned a CDN domain, for instance https://a.mycdn.net. During configuration, you map this CDN domain to your origin server https://rubansahoo.com.

The Request Flow

Here's what happens when a user requests content through the CDN:

User Request: The user's browser sends a request to https://a.mycdn.net/dashboard/profile.jpeg
CDN Lookup: The CDN receives the request and identifies that a.mycdn.net is configured to use https://rubansahoo.com as its origin
Cache Check: The CDN checks if it has a cached copy of /dashboard/profile.jpeg
Origin Fetch (on cache miss): If the content isn't cached, the CDN forwards the request to https://rubansahoo.com/dashboard/profile.jpeg
Cache and Serve: The CDN receives the response from the origin, stores it in its cache, and returns it to the user
Subsequent Requests: Future requests for the same resource are served directly from the CDN's cache without touching the origin server

The Push vs. Pull Model

A common misconception about CDNs is that you need to actively "push" your content to them. In reality, CDNs typically operate on a pull model:

You don't upload files to the CDN
Instead, the CDN pulls content from your origin server on-demand when users first request it
Content remains cached until it expires or is invalidated

When you need to update content, you invalidate (or "purge") specific paths in the CDN cache. This marks those cached entries as stale. The next time a user requests that path, it triggers a cache miss, forcing the CDN to fetch the fresh version from your origin server. This new version is then cached and served to subsequent users.

This pull-based approach simplifies deployment workflows since you only need to update content on your origin server and then invalidate the relevant CDN cache entries, rather than managing separate upload processes to multiple CDN locations.

Designing and Implementing Photo Upload at Scale

Let's design a system capable of handling 5 million photo uploads per day, exploring the architecture decisions, tradeoffs, and optimisations needed for a production-grade photo platform.

Requirements

Handle 5M photo uploads per day
Ensure efficient storage and retrieval
Maintain privacy and security
Optimise for different devices and network conditions
Design for extensibility

Key Design Considerations

When building a photo upload system at this scale, we need to think about storage infrastructure, data flow architecture, separation of concerns, privacy mechanisms, and performance optimisations.

1. The Fundamental Architecture

Storage Choice: S3/GCS

For storing images, object storage services like AWS S3 or Google Cloud Storage are the natural choice. They provide:

Virtually unlimited scalability
Built-in redundancy and durability
Cost-effective storage for large binary objects
Simple HTTP-based access patterns

The Bandwidth Problem

A common mistake is routing photo uploads through your API servers. Here's why this is problematic:

When a user uploads a 5MB photo through your API server, which then forwards it to S3, you're consuming bandwidth twice:

User → API Server (ingress)
API Server → S3 (egress)

This doubles your network costs and creates an unnecessary bottleneck at your API layer. Your application servers should handle business logic, not act as data proxies.

Metadata Storage: Sharded Relational Database

While images live in object storage, metadata (post information, user associations, captions) belongs in a relational database, sharded by user ID for horizontal scalability.

Serving Images Efficiently

Images should be served directly from S3 through a CDN layer, which provides:

Geographic distribution and edge caching
Reduced latency for users worldwide
Built-in DDoS protection and security
Bandwidth cost optimisation

2. Uploading Photos with Pre-Signed URLs

The solution to the bandwidth problem is pre-signed URLs, which allow direct client-to-storage uploads.

Here's the flow:

User requests upload: User A calls the Image Service to initiate an upload
Generate unique ID: Image Service generates a random image ID (e.g., 1729_4275)
Create pre-signed URL: Image Service requests S3 to create a temporary signed URL for the path:

   s3://insta-photos/<user_id>/<random_photo_id>.jpg

Direct upload: User A uploads the photo directly to S3 using the signed URL
Photo stored: The file is now stored at the designated S3 path

This approach eliminates the API server from the upload path entirely, cutting bandwidth consumption in half.

3. Publishing Photos

After uploading, the user creates a post entry:

The user retains the random image_id received from the Image Service and uses it when creating a post through the Posts Service.

Posts Table Schema:

id          → generated from ID service
user_id     → author of the post
image_id    → random image ID from Image Service
caption     → post caption
created_at  → timestamp

Why Store Only the Image ID?

Instead of storing the full CDN URL like https://instacdn.net/A/1729_4275.jpg, we store just the image_id. The complete S3 path can be computed at runtime:

s3://insta-photos/<user_id>/<image_id>.jpg

This approach stores the essential data rather than derived data, providing flexibility to change storage patterns or CDN configurations without database migrations.

Bonus: The Posts Service can validate that the image_id belongs to the requesting user_id, preventing users from posting photos they didn't upload.

This pattern is used by platforms like LinkedIn and many CDN providers.

4. Privacy Considerations

Are Private Photos Really Private?

When using CDNs, the answer is nuanced. Instagram's approach uses time-limited signed URLs:

https://instacdn.net/ul/1729_4275.jpg?ig_cache_key=...&oh=...&oe=...&nc_sid=...

How it works:

When rendering private photos, the backend generates a URL with cryptographic signatures
The signature includes an expiration timestamp
When the browser requests the image via an <img> tag, the CDN validates:
- Signature authenticity
- Expiration time
If valid and not expired, the CDN serves the image; otherwise, it returns an error

The limitation: These URLs are short-lived (typically minutes to hours), but during that window, anyone with the URL can access the photo. This is pseudo-privacy, not true privacy.

Achieving True Privacy

For genuine private photo platforms, you cannot use CDN caching. Instead:

All requests must route through your API servers
Every request must check user permissions against your database
This significantly increases infrastructure costs and complexity
Most platforms accept pseudo-privacy as a reasonable tradeoff

5. Overall Data Flow

Let's trace a complete journey:

Upload Phase:

User A initiates upload via Image Service
Image Service returns pre-signed URL and random image_id
User A uploads photo directly to S3

Post Creation:

User A creates post via Posts Service with image_id and caption
Posts Service creates entry in posts table

Consumption Phase:

User B requests User A's profile: GET /users/A/posts
Posts Service returns:

   [
     {
       "id": "...",
       "user_id": "A",
       "caption": "...",
       "url": "https://instacdn.net/A/1234.png"
     }
   ]

Frontend renders images using <img> tags
Browser fetches photos from CDN
CDN serves cached (or freshly fetched) images

6. Image Optimisation at Scale

Users access your platform from vastly different contexts:

Geographic locations (different network latencies)
Network conditions (5G vs. 3G vs. WiFi)
Device capabilities (flagship phones vs. budget devices)
Screen sizes (tablets vs. phones)

Sending a 5MB high-resolution photo to every user is wasteful and creates poor user experience for those on slower connections.

Dynamic Image Transformation

Instead of pre-generating multiple resolutions, modern CDNs offer on-the-fly image transformation:

https://instacdn.net/ul/1729_4275.jpg?w=360

How it works:

CDN receives request with transformation parameters (w=360 for 360px width)
If transformed version is cached, serve it immediately
If not cached:
- Fetch original from S3
- Apply transformation (resize to 360px width)
- Cache the transformed version
- Serve to user
Subsequent requests for the same transformation are served from cache

This approach provides:

No upfront processing costs
Infinite flexibility in dimensions
Automatic caching of popular sizes
Reduced storage overhead (only cache what's actually requested)

Why Not Build Your Own?

While you could build an image optimisation service, CDNs provide this functionality out of the box with:

Global edge locations
Battle-tested transformation engines
Automatic format optimisation (WebP, AVIF)
Responsive image support

This is a clear case where leveraging existing infrastructure beats building custom solutions.

Conclusion

Building a photo upload system at scale requires careful consideration of bandwidth costs, storage patterns, privacy tradeoffs, and user experience optimisation. The pre-signed URL pattern, metadata-driven URL generation, and CDN-based transformations form the foundation of modern photo platforms serving billions of images daily.

Designing a Concurrent Hashtag Counter at Scale

Building a hashtag service that handles millions of hashtags and provides real-time counts is a fascinating distributed systems problem. Let's design a system that delivers the best user experience while handling massive write volumes and maintaining fast read response times.

Requirements

Support millions of hashtags
Handle high-volume concurrent updates (posts with hashtags)
Provide super-fast response times for read queries
Track popular photos per hashtag
Maintain accurate counts despite distributed writes

Key Design Challenges

When building a hashtag counter at scale, we face several critical challenges:

Storage: Where and how to store hashtag counts and metadata
Counting at volume: Processing millions of hashtag updates efficiently
Inter-service communication: How Post Service communicates with Hashtag Service
Atomicity: Ensuring counts remain accurate under concurrent updates
Partial updates: Handling failures gracefully without losing data
Read/Write optimisation: Balancing fast reads with high write throughput

Architecture Overview

The system consists of several key components working together:

Event-Driven Architecture with Kafka

Kafka acts as the glue connecting our services. When the Post Service processes a new photo upload, it doesn't directly call the Hashtag Service. Instead:
1. Post Service publishes events to a Kafka topic (e.g., post-created)
2. Hashtag Extraction service consumes these events and extracts hashtags from captions
3. Extracted hashtags are published to another Kafka topic: post-hashtag
4. Multiple downstream consumers process these events

Why Kafka?

Decoupling: Post Service and Hashtag Service don't need to know about each other
Durability: Events are persisted, allowing replay if processors fail
Scalability: Multiple consumers can process events in parallel
Back-pressure handling: Services consume at their own pace

The Adapter Pattern for Service Integration

The Adapter Pattern is crucial here. The Hashtag Extraction service acts as an adapter between:
- Upstream: Post Service (which knows about posts and photos)
- Downstream: Hashtag Service (which only cares about hashtag events)

This abstraction means:

Post Service doesn't need hashtag-specific logic
Hashtag Service can be updated independently
Additional consumers can tap into the same event stream

Counting Servers: Effective Batching and Counting

The naive approach of incrementing a database counter for every single hashtag mention would overwhelm the database. Instead, we use Counting Servers that batch updates:

How it works:
1. Consume events: Counting Servers read from the post-hashtag Kafka topic
2. In-memory aggregation: Maintain in-memory counters for hashtags over a time window (e.g., 10 seconds)
3. Batch flush: Periodically flush aggregated counts to the database

Example:

    Incoming events (over 10 seconds):
    - #sunset: 500 mentions
    - #travel: 300 mentions
    - #photography: 450 mentions

    Batch update to DB:
    UPDATE hashtags SET count = count + 500 WHERE tag = 'sunset';
    UPDATE hashtags SET count = count + 300 WHERE tag = 'travel';
    UPDATE hashtags SET count = count + 450 WHERE tag = 'photography';

This reduces database writes from potentially millions per second to thousands per second, dramatically improving throughput.

Note on throughput: The system note "Not high throughput" refers to the fact that while Counting Servers handle batching well, they don't need extreme throughput capabilities because Kafka handles the back-pressure and buffering upstream. The batching strategy makes the write load manageable.

Popularity Service: Tracking Top Photos

The Popularity Service is responsible for identifying the most engaging photos for each hashtag. It:
1. Consumes events from their own Kafka topic (separate from hashtag counting)
2. Uses its own logic to determine "top photos" (likes, comments, engagement, recency)
3. Publishes results indicating which photos are trending for specific hashtags
4. These results are consumed by Workers who update the system

This separation of concerns means:

Counting and ranking are independent operations
Different teams can optimise each service separately
The algorithm for "popularity" can evolve without affecting counting

Workers: Maintaining Hashtag Metadata

Workers are responsible for updating the broader hashtag metadata beyond just counts:
- Top photos for each hashtag (from Popularity Service)
- Related hashtags (computed periodically)
- Trending status flags
- Metadata updates (description, category)

Workers consume from multiple sources and update the partitioned database that backs the Hashtag API.

Storage Strategy: Partitioned Databases

The system uses partitioned databases for different data access patterns:

Primary Database (for counts):
- Stores hashtag counts and core metadata
- Partitioned by hashtag (e.g., hash-based sharding)
- Optimized for writes (from Counting Servers)

Secondary Database (for reads):

Denormalised data optimised for API responses
May include cached top photos, related tags
Partitioned for query patterns (e.g., by popularity tier)

Cache Layer:

Redis/Memcached sits in front of databases
Caches popular hashtag data (top 1000 hashtags, trending tags)
TTL-based expiration with background refresh
Dramatically reduces database load for read queries

Read and Write Path Optimisations

Write Path (Ingestion):

Kafka buffering: Absorbs traffic spikes, provides durability
Batch processing: Counting Servers aggregate before DB writes
Asynchronous updates: No synchronous blocking operations
Partitioned writes: Each Counting Server handles a subset of hashtags

Read Path (Queries):

CDN for API responses: Cache GET /hashtag/<tag> responses at edge
Application cache: Redis stores hot hashtag data
Database replicas: Read queries hit read replicas, not primary
Denormalised data: Avoid joins; data is pre-aggregated

Result: Users see hashtag counts and top photos load in <100ms globally.

Handling the `GET /hashtag/<tag>` Request

When a user queries GET /hashtag/sunset:

CDN check: If response is cached at edge, return immediately (~10-50ms)
Load Balancer: Routes to Hashtag API server
Cache check: API checks Redis for hashtag data
Database query (on cache miss): Fetch from partitioned DB
Response construction: Return count, top photos, related tags
Cache population: Store result in Redis and CDN

System Properties

Consistency Model

Eventual consistency: Counts may lag by seconds (acceptable for social media)
Strong consistency where needed: User's own posts immediately reflect in their view

Fault Tolerance

Kafka replication: No event loss
Consumer groups: Automatic partition rebalancing if a Counting Server fails
Idempotent processing: Duplicate events don't double-count (use event IDs)

Scalability

Horizontal scaling: Add more Counting Servers or Workers as needed
Kafka partitions: Increase parallelism by adding partitions
Database sharding: Distribute hashtags across shards

Key Takeaways

Kafka as a Glue

Kafka decouples services, provides durability, and enables multiple consumers to process the same event stream for different purposes (counting, popularity tracking, analytics).
Adapter Pattern

The Hashtag Extraction service adapts Post Service events into a format suitable for downstream consumers, enabling loose coupling and independent evolution.
Effective Batching and Counting

Counting Servers aggregate updates in memory before writing to the database, reducing write load from millions to thousands of operations per second.
Read and Write Path Optimisations
- Write path: Kafka buffering, batch processing, asynchronous updates
- Read path: Multi-layer caching (CDN, Redis, DB replicas), denormalized data

This architecture handles the demanding requirements of a hashtag service at social media scale while maintaining sub-100ms response times for users worldwide.

Real-World Considerations

Monitoring and Observability

Track Kafka consumer lag (are Counting Servers keeping up?)
Monitor cache hit rates (is Redis serving most reads?)
Alert on count anomalies (sudden spikes may indicate spam)

Data Quality

Implement spam detection for hashtag abuse
Normalize hashtags (#Sunset, #sunset, #SUNSET → #sunset)
Filter inappropriate hashtags before indexing

Cost Optimisation

Archive old hashtag-post associations to cheaper storage
Use tiered caching (hot tags in memory, warm tags in Redis, cold tags in DB)
Compress Kafka messages for network efficiency

This design demonstrates how careful architectural choices around messaging, batching, caching, and data partitioning enable building services that handle billions of events while delivering exceptional user experience.

Designing Gravatar and Dynamic OG Images

Let's explore how to build systems that serve images dynamically, from simple profile pictures to sophisticated social media preview cards. We'll design a Gravatar-like service and understand how platforms like GitHub generate Open Graph images on demand.

Understanding Image Serving Fundamentals

Before diving into Gravatar, let's understand how web servers serve images.

Traditional Static File Serving

Consider a typical website structure:

site/
├── static/
│   ├── img/
│   │   ├── ruban.jpg
│   │   └── logo.jpg
│   ├── js/
│   └── css/

When a user requests https://rubansahoo.com/static/img/ruban.jpg:

Server receives the request
Parses the URL path /static/img/ruban.jpg
Reads the file from disk at that path
Sends the file bytes in the HTTP response
Browser renders the bytes as an image

This works well for small sites, but has limitations at scale.

Proxying S3 for Image Storage

Instead of serving from disk, we can serve from S3 while maintaining the same URL structure:

@app.route('/raw/<path>')
def raw_handler(path):
    raw_bytes = s3.read(path, BUCKET)
    return raw_bytes

Now when a user requests http://localhost:5000/raw/users/ruban.jpg:

Server receives the request with path /raw/users/ruban.jpg
Extracts the S3 key: users/ruban.jpg
Fetches the file from S3 bucket
Returns the bytes directly to the client

We've built an S3 proxy! This pattern provides:

Unlimited storage (S3 scales infinitely)
No disk space concerns on application servers
Centralised image management
Easy backup and replication

Real-World Example: GitHub's Open Graph Images

GitHub demonstrates the power of dynamic image generation brilliantly. When you share a repository link on social media, GitHub generates a custom Open Graph (OG) image on-the-fly that includes:

Repository name
Repository description
Star count
Fork count
Language badges
Owner avatar

These aren't pre-generated static images, they're created on demand based on the current repository metadata. This ensures social previews are always up-to-date without storing millions of images.

Designing Gravatar: A Profile Picture Service

Gravatar provides a universal profile picture system: one URL that works across the entire web.

What is Gravatar?

Gravatar gives you a single embeddable URL for your profile picture:

<img src="https://gravatar.com/0eafd172">

The URL uses a hash of your email instead of the email itself:

hash("ruban1work@gmail.com") = 0eafd172

Why hash the email? Privacy! The hash prevents exposing Personal Identifiable Information (PII) while still providing a unique, stable identifier.

Requirements

Users can upload multiple photos
Users can mark one photo as active
The active photo should be returned when requesting the user's Gravatar URL
Fast, globally distributed delivery

Database Schema

Users Table:

id          | email                  | hash      | active_photo_id
------------|------------------------|-----------|----------------
729         | ruban1work@gmail.com   | 0eafd172  | 7abe

Photos Table:

id          | user_id
------------|--------
7abe        | 729
8abe        | 729
cdae        | 729
e7215       | 729

Key design decisions:

hash is indexed for fast lookups
active_photo_id is a foreign key to the photos table
Users can have multiple photos but only one is active

Architecture Flow

Let's trace through the complete lifecycle of a Gravatar photo.

Photo Upload Flow

Step 1: Prepare for Upload

User requests permission to upload a new photo:
```
 POST /upload/prepare
```
Photo Upload Service:
1. Generates a random photo ID (e.g., 8abe)
2. Creates a pre-signed S3 URL for path:
```
 s3://gravatar-images/{user_id}/{random_photo_id}s3://gravatar-images/729/8abe
```
3. Returns the signed URL to the user

Step 2: Direct Upload to S3

User uploads the photo directly to S3 using the pre-signed URL (no bandwidth through API servers).

Step 3: Register Photo in Gravatar

User makes a POST request to register the uploaded photo:

    POST /photos
    {
      "photo_id": "8abe",
      "user_id": 729
    }

This creates an entry in the photos table.

Marking a Photo as Active

To change which photo appears as the Gravatar:
```
 UPDATE users
 SET active_photo_id = '8abe'
 WHERE hash = '0eafd172';
```
This single update changes which photo will be served globally.
Serving the Active Photo

When someone embeds your Gravatar in their website:
```
 <img src="https://gravatar.com/0eafd172">
```
But wait! We want gravatar.com to serve images, yet our API is at api.gravatar.com. How do we bridge this?

Introducing CDN as the Public Interface

CDN Configuration:

gravatar.com → ORIGIN: api.gravatar.com/photos

Now the flow becomes:

Request: https://gravatar.com/0eafd172

CDN forwards to: https://api.gravatar.com/photos/0eafd172

API Server Logic:

@app.route('/photos/<hash>')
def get_gravatar(hash):
    # Get active photo ID
    result = db.query(
            """
            SELECT active_photo_id, id
            FROM users
            WHERE hash = ?
            """, 
            hash
        )

    user_id = result['id']
    photo_id = result['active_photo_id']

    # Construct S3 path
    s3_path = f"{user_id}/{photo_id}"

    # Fetch from S3
    image_bytes = s3.read(s3_path, bucket="gravatar-images")

    return image_bytes

Response flows back through CDN to user, and the CDN caches it for future requests.

Cache Invalidation: Keeping Photos Fresh

When a user marks a new photo as active, we need to invalidate the CDN cache so users see the updated photo immediately.

Asynchronous Invalidation Flow:

User marks photo cdae as active
API server updates the database

API server publishes event to message broker (Kafka/RabbitMQ):

 {  
     "event": "photo_updated",
     "user_hash": "0eafd172",
     "new_photo_id": "cdae"
 }

Worker consumes the event
Worker calls CDN API to invalidate:
```
 PURGE https://gravatar.com/0eafd172
```
Next request fetches fresh data from origin

This approach:

Doesn't block the user's request
Ensures eventual consistency
Handles CDN invalidation failures gracefully (can retry)

On-Demand Image Optimisation

Real-world Gravatar usage varies wildly:

Email clients might need 32×32px thumbnails
Profile pages might display 256×256px
High-DPI displays might request 512×512px

Storing pre-generated versions for every possible size is impractical. Instead, we use URL-driven transformations.

URL-Driven Transformations

<img src="https://gravatar.com/0eafd172?w=32">

The ?w=32 query parameter tells the CDN: "I want this image at 32px width."

How CDNs Handle Image Transformations

Modern CDNs (like Cloudflare, Fastly, Cloudinary) provide this feature out-of-the-box:

Request received: https://gravatar.com/0eafd172?w=240
Cache check: Is the 240px version cached?
- If yes: Return immediately
- If no: Continue to step 3
Fetch original: Request https://api.gravatar.com/photos/0eafd172 from origin
Transform: Resize image to 240px width (maintaining aspect ratio)
Cache transformed version: Store the 240px variant
Return response: Send transformed image to user

Subsequent requests for the same transformation are served directly from cache.

Image Transformation Characteristics

CPU Intensive:

Resizing a 1024×1024 image to 32×32 requires significant processing
Cannot be done asynchronously (user is waiting for the response)
Requires powerful servers with sufficient CPU

Scaling Considerations:

Need large server instances to handle transformation workload
Need many servers to handle concurrent transformation requests
First request for a new size/image combination is slower (cache miss)

Popular Image Processing Libraries:

ImageMagick: Industry-standard command-line tool
libvips: Faster, more memory-efficient alternative
Sharp (Node.js): High-performance wrapper around libvips
Pillow (Python): Popular library for image manipulation

Why Use CDN Image Optimisation?

Building your own image transformation service is possible but complex:

Need to handle multiple image formats (JPEG, PNG, WebP, AVIF)
Need to implement caching strategies
Need to manage geographic distribution
Need to handle security (preventing resource exhaustion attacks)

CDNs provide this functionality out-of-the-box with:

Global edge locations
Battle-tested transformation engines
Automatic format optimisation
Security and rate limiting

The Complete Gravatar System

Putting it all together:

Upload Path:

User → Photo Upload Service (pre-signed URL) → S3
     → API (register photo) → Database

Serving Path:

Browser → CDN (gravatar.com)
       → API (api.gravatar.com/photos) → Database (get active photo)
                                       → S3 (fetch image)
       ← CDN (cache and transform) ← API
       ← Browser

Update Path:

User → API (mark photo active) → Database
     → Message Broker → Worker → CDN (invalidate cache)

Key Architectural Benefits

Speed: CDN edge caching provides sub-100ms response times globally
Scale: S3 and CDN handle storage and delivery at any scale
Flexibility: URL-driven transformations support any image size
Efficiency: Pre-signed URLs eliminate bandwidth through API servers
Security: Email hashing protects user privacy
Simplicity: Users manage photos through a simple API

This architecture demonstrates how combining object storage, CDN capabilities, and smart caching strategies creates a robust, globally scalable image service that powers millions of websites.

Designing Photo Tagging: The Product Manager Hat

Building a photo tagging feature requires more than just writing code. A senior engineer approaches system design by wearing three distinct hats throughout the process.

The Three Hats of a Senior Engineer

Product Manager: Understanding user needs, defining requirements, asking the right questions
Tech Architect: Designing scalable systems, choosing technologies, planning for extensibility
Software Engineer: Implementing robust, maintainable code

For the photo tagging feature, let's focus on the first and most critical step: wearing the Product Manager hat.

Wearing the Product Manager Hat: Asking the Right Questions

Before writing any code or designing schemas, a senior engineer asks critical questions to stakeholders and senior management. The answers to these questions fundamentally shape the technical design.

Authorisation: Who Can Tag?

Key questions:
- Can anyone tag anyone in any photo?
- Can only the photo owner create tags?
- Do tagged users need to approve tags before they appear publicly?
- Can users prevent themselves from being tagged by certain people?

Why it matters: This defines your authorisation model, privacy settings architecture, and approval workflow complexity.

Limits: Maximum Tags Per Photo

Key questions:
- What's the maximum number of people that can be tagged in a single photo?
- Are there rate limits for tagging operations to prevent spam?

Why it matters: Impacts database design, UI rendering performance, and spam prevention strategies. Typical answer: 20 tags per photo.

Notifications and Throttling

Key questions:
- Should users be notified immediately when tagged?
- Should we batch notifications (tagged in 5 photos → 1 notification)?
- How do we handle notification spam for popular users (celebrities tagged in 1000 photos)?

Why it matters: Determines notification service integration complexity, throttling mechanisms, and user experience design. Poor notification strategy leads to user fatigue and feature abandonment.

Self-Removal: Can Users Untag Themselves?

Key questions:
- Can tagged users remove themselves from photos?
- Should the photo owner be notified when someone removes their tag?
- Can users block being tagged by specific people?

Why it matters: Critical for user privacy controls and social dynamics. Most platforms allow self-removal without notifying the photo owner to avoid social friction.

Face Recognition and Tag Suggestions

Key questions:
- Should we use ML to suggest tags based on face recognition?
- What's the expected latency/SLA from the ML team (1 minute? 1 hour?)?
- How do we handle false positives and user trust?

Why it matters: Determines ML service integration complexity, performance expectations, and privacy concerns. Typical answer: Asynchronous processing with 1-5 minute SLA, suggestions shown to photo owner for manual confirmation.

Profile and Activity Integration

Key questions:
- Should tagged photos appear on user profiles?
- Should there be a "Photos of You" section?
- How frequently will users query "all photos I'm tagged in"?

Why it matters: Query patterns dictate database indexing strategy. If "Photos of You" is a primary use case, you need efficient indexes on user_id, not just post_id.

Feed Integration

Key questions:
- Should tagging trigger feed updates?
- Should tagged users' followers see the photo in their feed?
- How does this interact with privacy settings (private photos, blocked users)?

Why it matters: Impacts feed generation algorithm complexity, privacy model enforcement, and determines whether you need event-driven architecture for extensibility.

Why These Questions Matter

Each answer shapes critical technical decisions:

Authorisation questions → RBAC service integration, approval workflow design
Limits questions → Database constraints, validation logic, spam prevention
Notifications questions → Message broker (Kafka) for batching, throttling algorithms
Self-removal questions → Database schema (status column: pending/approved/removed)
Face recognition questions → Async ML service integration, relative positioning storage
Profile questions → Multiple database indexes, query optimisation strategy
Feed questions → Event-driven architecture with Kafka, multiple service consumers

The Senior Engineer's Mindset

A junior engineer might start with "Let's create a post_tags table with post_id and user_id."

A senior engineer starts with "What problem are we actually solving for users, and what are the business constraints?" The technical design naturally follows from understanding requirements deeply.

By asking these questions upfront, you avoid costly redesigns later when stakeholders say "Oh, we also need approval workflows" or "Users should be able to remove tags themselves."

Next steps: With requirements clarified, you'd move to wearing the Tech Architect hat (designing the system with Kafka, relative positioning, service architecture) and finally the Software Engineer hat (implementation details). But it all starts with asking the right questions.

Designing a Newly Unread Message Indicator

When building a messaging platform, one of the most critical UX features is the "unread message indicator", that little badge showing users they have new messages. But there's a subtle distinction that significantly impacts system design: newly unread vs. total unread messages.

Understanding the Requirement

The Problem: We need to inform users about the presence of new messages they haven't seen yet, not just messages they haven't acknowledged.

Key Insight: A user might have 100 unread messages, but they're only from 3 different people who recently sent messages. What matters to the user is: "How many people have sent me messages I haven't seen yet?"

This is fundamentally different from counting total unread messages.

Example Scenario

User B has:
- 45 unread messages from User A
- 30 unread messages from User C  
- 25 unread messages from User D

Newly Unread Count = 3 (not 100!)

The badge should show "3", indicating 3 different people have sent messages, not 100 individual messages.

Requirements

Near real-time updates: Badge must update within seconds of receiving a new message
Accurate count: Track unique senders, not total messages
High availability: This is a critical user-facing feature
Scalable: Must handle millions of concurrent users

The Core Formula

# newly unread = # unique users from whom messages are received and unread

System Design

Problem 1: When Does a Message Become "Unread"?

A message is "newly unread" when it's not delivered to the recipient. But how do we know if a message isn't delivered?

Solution: WebSocket Connection Status

The messaging service uses WebSockets (WS) for real-time message delivery. WebSockets provide a crucial piece of information: whether a user is currently connected.

User A connected via WebSocket → Messages delivered in real-time
User B not connected (offline) → Messages are "undelivered"

When the messaging service attempts to send a message but finds the recipient offline, it publishes an ON_MSG_UNSENT event.

Architecture: Event-Driven Design

User A (sender) ──WS──> Messaging Service ──> Partitioned Chat DB
                              │
                              │ (User B offline?)
                              │
                              ▼
                    ON_MSG_UNSENT event
                              │
                              ▼
                      Offline Service

Event Structure:

{
  "event": "ON_MSG_UNSENT",
  "data": {
    "src": "A",     // sender
    "dest": "B",    // recipient (offline)
    "msg": "...",
    "msg_id": "12345",
    "timestamp": "2025-01-15T10:30:00Z"
  }
}

The Key Insight: Unique Sender Tracking

We need to track: Which unique users have sent unread messages to User B?

This is a set membership problem, not a counting problem. The natural solution is Redis Sets.

Architecture: Read and Write Paths

Write Path: Updating Newly Unread Count

When ON_MSG_UNSENT event is published:

Offline Service (workers) consumes the event
Workers update Auxiliary Redis with the unique sender
Status Update Workers handle database persistence asynchronously

Redis Operation:

SADD user:B:unread_from "A"
# Returns: 1 (if A wasn't already in the set)
# Returns: 0 (if A was already in the set)

The Redis SET automatically handles uniqueness, adding User A multiple times still results in a set size of 1.

Get the count:

SCARD user:B:unread_from
# Returns: 3 (if A, C, and D have sent unread messages)

Read Path: Displaying the Badge

When User B's app needs to display the badge:

API Call:

GET /api/users/B/status

Status Check API Flow:

User B → Status Check API → Redis Cluster
                           → Partitioned Chat DB (fallback)

The API can batch requests using Redis pipeline:

PIPELINE
SCARD user:B:unread_from
GET user:B:online_status
GET user:B:last_seen
EXEC

Response:

{
  "newly_unread_count": 3,
  "online_status": "offline",
  "last_seen": "2025-01-15T10:30:00Z"
}

Alternatively, User B's app can call a simpler endpoint:

GET /api/users/B/clean_status

This returns a lean response focused just on the unread count.

The Auxiliary Database Pattern

High-Level Pattern:

Whenever you have a brittle component in your infrastructure (typically your database), and if it's doing a lot of unnecessary operations leading to no state changes, try to add an auxiliary database and reduce the load on the primary database. This way the service will be up and running, and everything will be happy.

Why This Matters Here:

Without Auxiliary Redis:

Every ON_MSG_UNSENT event would query the chat database
"Does User B already have unread messages from User A?"
If yes, no database update needed (wasted query)
Under high load (millions of messages/second), this kills the database

With Auxiliary Redis:

Redis SET automatically handles deduplication
SADD operations are O(1) and incredibly fast
Only meaningful state changes are written to the main database
Main database handles persistent storage asynchronously
Redis acts as a high-speed buffer for real-time operations

Data Consistency: Redis and Database

Dual Storage Strategy:

Redis (Auxiliary): Source of truth for real-time reads
- Fast reads (sub-millisecond)
- Handles deduplication automatically
- In-memory, so may need persistence (AOF/RDB)
Partitioned Chat DB: Source of truth for persistence
- Stores message delivery status
- Enables historical queries
- Recovery mechanism if Redis fails

Consistency Model:

Redis updated immediately (strong consistency for reads)
Database updated asynchronously via workers (eventual consistency)
B-C: True, B-D: True flags indicate whether certain relationships exist in auxiliary Redis

Clearing the Badge: When Messages Are Read

When User B comes online and reads messages from User A:

WebSocket Event:

{
  "event": "MESSAGES_READ",
  "data": {
    "user_id": "B",
    "sender_id": "A",
    "read_up_to_msg_id": "12350"
  }
}

Redis Operation:

SREM user:B:unread_from "A"
# Removes A from the set

Updated count:

SCARD user:B:unread_from
# Returns: 2 (now only C and D remain)

The badge updates in near real-time via WebSocket push to User B's client.

Scalability Considerations

Redis Cluster Sharding

user:A:unread_from → Redis Node 1
user:B:unread_from → Redis Node 2
user:C:unread_from → Redis Node 3

Hash-based sharding distributes load across Redis cluster nodes.

Database Partitioning

The main chat database is partitioned (likely by user_id or conversation_id) to handle write throughput.

Worker Scaling

Status Update Workers can be scaled horizontally:

Consume from partitioned Kafka topics
Each worker handles a subset of users
Idempotent operations (safe to retry)

Notification System Integration

The Notification System consumes the same ON_MSG_UNSENT events to send push notifications, email alerts, etc. This demonstrates the power of event-driven architecture: one event, multiple consumers, each handling their specific concern.

Edge Cases and Handling

User Comes Online Mid-Event Processing

Scenario: User B comes online while offline service is processing unread messages.

Solution:
- Check WebSocket connection status before adding to Redis
- Race conditions are acceptable (eventual consistency)
- Worst case: Badge shows briefly, then updates when messages are marked read
Redis Failure

Fallback:
- API falls back to querying the partitioned chat database
- Slower but ensures availability
- Redis recovery rebuilds state from database
Multiple Devices

Scenario: User B reads messages on phone, but desktop still shows unread badge.

Solution:
- WebSocket broadcasts MESSAGES_READ event to all User B's connected devices
- Each device updates its local badge immediately
- Eventual consistency across devices

Performance Metrics

Expected Performance:

Write latency: <10ms (Redis SADD operation)
Read latency: <5ms (Redis SCARD operation)
Badge update latency: <500ms end-to-end (event → Redis → WebSocket push)
Throughput: Millions of messages/second with Redis Cluster

Key Takeaways

Auxiliary Database Pattern: Use Redis to shield your main database from high-frequency operations with low state-change ratios
SET Data Structure: Redis SETs are perfect for tracking unique senders (automatic deduplication)
Event-Driven Architecture: ON_MSG_UNSENT event enables multiple consumers (unread tracking, notifications, analytics)
Separation of Concerns:
- Messaging Service: Delivers messages
- Offline Service: Tracks unread state
- Status Check API: Serves badge counts
- Status Update Workers: Persists to main database
Near Real-Time UX: WebSocket + Redis enables sub-second badge updates globally

This design demonstrates how choosing the right data structure (Redis SET) and architectural pattern (auxiliary database) can turn a complex problem into an elegant, scalable solution.

That's all for now folks. See you in the next blog!

Command Palette

How CDN Works

The Basic Architecture

The Request Flow

The Push vs. Pull Model

Designing and Implementing Photo Upload at Scale

Requirements

Key Design Considerations

1. The Fundamental Architecture

2. Uploading Photos with Pre-Signed URLs

3. Publishing Photos

4. Privacy Considerations

5. Overall Data Flow

6. Image Optimisation at Scale

Conclusion

Designing a Concurrent Hashtag Counter at Scale

Requirements

Key Design Challenges

Architecture Overview

Event-Driven Architecture with Kafka

The Adapter Pattern for Service Integration

Counting Servers: Effective Batching and Counting

Popularity Service: Tracking Top Photos

Workers: Maintaining Hashtag Metadata

Storage Strategy: Partitioned Databases

Read and Write Path Optimisations

Handling the GET /hashtag/<tag> Request

System Properties

Key Takeaways

Real-World Considerations

Designing Gravatar and Dynamic OG Images

Understanding Image Serving Fundamentals

Real-World Example: GitHub's Open Graph Images

Designing Gravatar: A Profile Picture Service

Photo Upload Flow

Marking a Photo as Active

Serving the Active Photo

Cache Invalidation: Keeping Photos Fresh

On-Demand Image Optimisation

The Complete Gravatar System

Key Architectural Benefits

Designing Photo Tagging: The Product Manager Hat

The Three Hats of a Senior Engineer

Wearing the Product Manager Hat: Asking the Right Questions

Authorisation: Who Can Tag?

Limits: Maximum Tags Per Photo

Notifications and Throttling

Self-Removal: Can Users Untag Themselves?

Face Recognition and Tag Suggestions

Profile and Activity Integration

Feed Integration

Why These Questions Matter

The Senior Engineer's Mindset

Designing a Newly Unread Message Indicator

Understanding the Requirement

Requirements

The Core Formula

System Design

Architecture: Event-Driven Design

Architecture: Read and Write Paths

Write Path: Updating Newly Unread Count

Read Path: Displaying the Badge

The Auxiliary Database Pattern

Data Consistency: Redis and Database

Clearing the Badge: When Messages Are Read

Scalability Considerations

Edge Cases and Handling

Performance Metrics

Key Takeaways

Comments

System Design

Building Storage Engines

More from this blog

Handling the `GET /hashtag/<tag>` Request