4 Step Framework for System Design Interview

Cracking a System Design Interview involves showing a deep understanding of designing scalable, efficient, and reliable systems. Here's a 4 step framework to help you succeed.

Mentor

Blog

Nailing a system design interview requires a structured approach and showing a deep understanding of designing scalable, efficient, and reliable systems. 

Following a well-defined framework is important to ensure you cover all the essential aspects and think through the problem completely.

In this article, we'll discuss a framework that outlines the main steps you should follow during a system design interview. 

As a candidate, it's essential to walk the interviewer through each of these steps to show your system design skills and thought process.

The key steps in the framework include:

📌 Requirements Gathering and Assumptions

📌 Database Design

📌 Capacity Estimation and Constraints

📌 High-Level System Design

Plus real-world examples and practice questions to solidify the concepts.

Let's dive into each step of the framework in detail. 👇

Requirements/Assumptions:

1. Functional Requirement: 

Functional requirements define the specific functionalities and features that a system must deliver to meet the needs of its users.

These requirements describe what the system is supposed to do, the actions it should perform, and the expected outcomes. Functional requirements are essential for shaping the design and development of a system.

Example: Build Live streaming system which gave following feature.

▶️ Viewer should be able to view an active real-time feed of live comments on each live video while scrolling across the Facebook newsfeed.

▶️ The total count of comments should be visible on each live video.

2. Non Functional Requirement:

Non-functional requirements define the characteristics or qualities that describe how a system should perform its functions. Unlike functional requirements that specify what the system should do, non-functional requirements focus on how well the system should do it. These requirements often address aspects related to performance, reliability, usability, security, and other quality attributes.

Example:

▶️ Highly availability

▶️ Fault-tolerant

▶️ Low latency

▶️ Scalability

▶️ Eventual Consistency

3. Daily Active Users (DAU):

This is the number of unique users who will use your app or system daily. Knowing this number is important because it tells you how much traffic and load your system needs to handle.

For example, if you expect 1 million DAU, your system needs to be beefy enough to support that many concurrent users without crashing or slowing down.

4. Read-to-Write Ratio:

This ratio tells you how many times users will be reading data versus writing/updating data. If the ratio is something like 100:1, it means for every 1 write operation, there will be 100 read operations.

Knowing this ratio helps you optimise your system for reads or writes accordingly.

If it's read-heavy, you can leverage caching and other techniques to speed up read operations.

5. Usage Patterns:

This is all about understanding how users will interact with your system over time.

Will there be specific times of the day or days of the week when usage will spike?

Knowing these patterns allows you to plan for scaling up or down your resources to match the demand.

For example, if you expect a ton of traffic every evening, you can provision more servers during those hours to handle the load.

Database

After you understand the requirements and assumptions, the next step is to think about the database.

This is where you'll store all the data for your system. Here are the things you should discuss about the database:

1. Type of Database:  Will you use a SQL database like PostgreSQL or MySQL? Or a NoSQL database like MongoDB or Cassandra?

The choice depends on factors like data structure, scalability needs, and whether you need strict consistency or can live with eventual consistency.

2. Schema: This is the blueprint of your database. It defines the tables/collections, fields, data types, and relationships between different entities.

For example, in a social media app, you might have tables for users, posts, comments, etc.

3. Entity Relationship Diagram (ERD): This is a visual representation of your database schema, showing how different entities (tables/collections) are related to each other.

Capacity Estimation

Once you have an idea of the database, you need to estimate how much capacity your system will need in terms of traffic and storage.

This helps you provision the right amount of resources:

1. Queries Per Second (QPS): This is the number of read and write requests your system needs to handle per second.

For example, if you expect 1 million DAU and each user makes 10 requests per day, your QPS would be around 115 (1M * 10 / 86400 seconds in a day).

2. Bandwidth: This is the amount of data transfer your system needs to support for reads and writes.

It depends on the size of the data being transferred and the QPS.

3. Storage: How much disk space will you need to store all the data in your database?

This depends on factors like the number of users, the amount of content (posts, images, videos), and data retention policies.

4. Memory: In addition to disk storage, you need to estimate the amount of memory (RAM) required to run your application and cache frequently accessed data for better performance.

High Level Design (Block Diagram for services & Components)

So far, we've discussed the initial steps of gathering requirements and assumptions, selecting the appropriate database, and estimating capacity needs like storage, traffic, and memory.

The next step in the framework is to present a high-level design for the overall system architecture without getting too bogged down in the tiny details just yet. This is where the high-level design and block diagram come into play. ✔️

It provides a visual representation of the major components and services in a system, showing their interactions and relationships.

Think of it like this: you're an architect planning to build a house.

Before you start laying bricks, you'd want to have a blueprint that shows all the main rooms, how they're connected, where the doors and windows go, and so on.

The high-level design is like that blueprint for your system.

Specifically, you'll want to draw a simple diagram with blocks representing the major components or services that make up your system.

For example, if you're building a social media app, you might have blocks for the "User Service," "Post Service," "Notification Service," and so on.

Then, you'll use arrows or lines to show how these components interact with each other.

Like, the "User Service" might talk to the "Post Service" to create a new post, or the "Notification Service" might get notified when a new post is created.

📌 By creating this block diagram, you can identify potential bottlenecks, which components need to be highly available or scalable, and make sure you're not missing any crucial pieces of the puzzle.

It's a way to get everyone on the same page before you dive into coding and implementation.

If you are interested in learning more about system design, connect with me on a free 1:1 session where we can discuss things at length.

Real World Examples and Practice Questions

Let's go through some real-world examples and practice questions to solidify the concepts:

Real-World Examples:

1. Twitter:

📌 Components: User Timeline Service, Tweet Service, Search Service, Notification Service, etc.

📌 Discussion: How to handle the firehose of tweets? How to efficiently fetch and rank the user's timeline? How to implement mentions, hashtags, and trends?

2. Netflix:

📌 Components: Content Catalog, Streaming Service, Recommendation Engine, Subscription Service, etc.

📌 Discussion: How to serve high-quality video streams with low latency? How to scale the recommendation engine to handle billions of videos and users? How to implement content licensing and digital rights management?

3. Uber:

📌 Components: Rider App, Driver App, Trip Booking Service, Map Service, Pricing Service, etc.

📌 Discussion: How to efficiently match riders and drivers based on location? How to handle surge pricing during peak hours? How to ensure data consistency between the rider and driver apps?

Practice Questions:

1. Design a URL Shortening Service:

▶️ Requirements: Ability to create short URLs, retrieve original URLs from short URLs, custom URLs, and analytics on click stats.

▶️ Key aspects: Discuss data partitioning, caching, load balancing, and system components like URL Encoding Service, Redirection Service, and Analytics Service.

2. Design a Distributed Key-Value Store:

▶️ Requirements: Store key-value pairs, support read/write operations, high availability, data partitioning, replication.

▶️ Key aspects: Discuss data partitioning schemes, replication strategies, consistency models, caching, load balancing, and failure handling.

3. Design a Newsfeed System:

▶️ Requirements: Display a feed of recent posts/updates from users' friends/follows, support likes, comments, and sharing real-time updates.

▶️ Key aspects: Discuss components like User Feed Service, Post Service, Notification Service, caching strategies, fan-out-on-write, and handling real-time updates.

4. Design a Collaborative Editing System:

▶️ Requirements: Allow multiple users to edit a document simultaneously, track changes, support concurrent updates, and version control.

▶️ Key aspects: Discuss data structures for storing document changes, operational transformation algorithms, conflict resolution, and real-time communication between clients.

These examples and practice questions cover a range of systems and challenges.

Remember, during an interview, it's important to ask clarifying questions, make reasonable assumptions, and walk through your thought process step-by-step while discussing the design.

Additionally, be prepared to dive deeper into specific areas based on the interviewer's prompts.