How to Answer System Design Questions

How to Answer System Design Questions was originally published on Exponent.

How to Answer System Design Questions

What’s the purpose of the system design interview?

The system design interview evaluates your ability to design a system or architecture to solve a complex problem in a semi-real-world setting.

It does not aim to test your ability to create a 100% perfect solution but instead assesses your ability to:

  • design the blueprint of the architecture,
  • analyze a complex problem,
  • discuss multiple solutions,
  • and weigh the pros and cons to reach a workable solution.

This interview, together with the manager or behavioral interview, is often used to determine the level at which you will be hired.

This lesson focuses on the framework for the system design interview.

Why use a system design interview framework?

Unlike a coding interview, a system design interview usually involves open-ended design problems.

If it takes a team of engineers years to build a solution to a problem, how can one design a complicated system within 45 minutes?

We need to allocate our time wisely and focus on the essential aspects of the design under tight time pressure. We must define the proper scope by clarifying the use cases.

An experienced candidate would not only show how they envision the design at a higher level but also demonstrate the ability to dive deep to address realistic constraints and tricky operational scenarios.

We also aim to maintain a clear communication style so that our interviewers understand why we want to spend time in certain areas and have a clear picture of our path forward.

Establishing a system design interview framework helps us:

  • better manage our time,
  • reinforce our communication with the interviewer,
  • and lead the discussion toward a productive outcome.

Once you are familiar with the framework, you can apply it every time you encounter a system design interview.

Anatomy of a system design interview

A system design interview typically consists of 5 steps:

  • Step 1: Define the problem space. Here, we understand the problem and define the scope of the design.
  • Step 2: Design the system at a high level. We lay out the most fundamental pieces of the system and illustrate how they work together to achieve the desired functionality.
  • Step 3: Deep dive into the design. Either you or your interviewer will pick an interesting component, and you will discuss its details.
  • Step 4: Identify bottlenecks and scaling opportunities. Think about the current design’s bottlenecks and how we can change the design to mitigate them and support more users.
  • Step 5: Review and wrap up. Check that the design satisfies all the requirements and potentially identify directions for further improvement.

A typical system design interview lasts about 45 minutes. A good interviewer leaves a couple of minutes in the beginning for self-introductions and a couple of minutes at the end for you to ask questions.

How to Answer System Design Questions

Therefore, we usually only have about 40 minutes for technical discussion. Here’s an example of how we can allocate the time for each of the steps:

  • Step 1: 5 minutes
  • Step 2: 10 minutes
  • Step 3: 10 minutes
  • Step 4: 10 minutes
  • Step 5: 5 minutes

The time estimate provided is approximate, so feel free to adjust it based on your interview style and the problem you’re trying to solve. The important thing is to integrate all the steps into a structured interview framework.

Step 1: Define your problem space

Time estimate: 5-10 minutes

It’s common for issues to be unclear at this stage, so it’s your job to ask lots of questions and discuss the problem space with your interviewer to understand all the system constraints.

One mistake to avoid is jumping into the design without first clarifying the problem.

It’s important to capture both functional and non-functional requirements. What are the functional requirements of the system design? What’s in and out of scope?

For instance, if you’re designing a Twitter timeline, you should focus on tweet posting and timeline generation services instead of user registration or how to follow another user.

Also, consider whether you’re creating the system from scratch. Who are our clients/consumers? Do we need to talk to pieces of the existing system?

Non-functional requirements

Once you’ve agreed with your interviewer on the functional requirements, think about the non-functional requirements of the system design. These might be linked to business objectives or user experience.

Non-functional requirements include:

  • availability,
  • consistency,
  • speed,
  • security,
  • reliability,
  • maintainability,
  • and even cost.

Some questions you might ask your interviewer to understand non-functional requirements are:

  • What scale is this system?
  • How many users should our app support?
  • How many requests should our server handle? A low query-per-second (QPS) number may mean a single-server design, while higher QPS numbers may require a distributed system with different database options.
  • Are most use cases read-only? If so, that could suggest a caching layer to speed up reading.
  • Do users typically read the data shortly after someone else overwrites it? That may indicate a strongly consistent system, and the CAP theorem may be a good topic to discuss.
  • Are most of our users on mobile devices? If so, we must deal with unreliable networks and offline operations.

If you’ve identified many design constraints and feel that some are more important than others, focus on the most critical ones.

Make sure to explain your reasoning to your interviewer and check in with them. They may be interested in a particular aspect of your system, so listen to their hints if they nudge you in one direction.

Estimating the amount of data

To estimate the amount of data you’re dealing with, you can do some quick calculations.

For example, you can show the QPS number, storage size, and bandwidth requirements to your interviewer. This will help you choose components and give you an idea of what scaling might look like later.

You can make some assumptions about user volume and typical user behavior, but make sure to check with your interviewer if these assumptions match their expectations.

Keep in mind that these estimates might not be exact, but they should be in the right range.

Step 2: Design your system at a high level

Time estimate:  5-10 minutes

Based on the constraints and features outlined in Step 1, explain how each piece of the system will work together.

Don’t get into the details too soon, or you might run out of time or design something that doesn’t work with the rest of the system.

You can start by designing APIs, which are like a contract that defines how a client can access our system’s resources or functionality using requests and responses. Think about how a client interacts with our system.

Maybe a client wants to create/delete resources, or maybe they want to read/update an existing resource.

Each requirement should translate to one or more APIs. You can choose what type of APIs you want to use (REST, SOAP, GraphQL, or RPC) and explain why. You should also consider the request’s parameters and the response type.

Once the APIs are established, they should not be easily changed and become the foundation of our system’s architecture.

How will the web server and client communicate?

After designing the APIs, think about how the client and web server will communicate. Some popular choices are

  • Ajax Polling,
  • Long Polling,
  • WebSockets,
  • and Server-Sent Events.

Each has different communication directions and performance pros and cons, so make sure you discuss and explain your choice with your interviewer.

Creating a high-level system design diagram

After designing the API and establishing a communication protocol, the next step is to create a high-level design diagram. The diagram should act as a blueprint of our design and highlight the most critical pieces to fulfill the functional requirements.

To illustrate the data and control flow in a system design question for a “Design Twitter” project, we can draw a high-level diagram. In this diagram, we have abstracted the design into an API server, several services we want to support, and the core databases.

How to Answer System Design Questions

At this stage, we should not dive deep into the details of each service yet. Instead, we should review whether our design satisfies all the functional requirements. We should demonstrate to the interviewer how the data and control flow look like in each of the functional requirements.

In the Twitter design example above, we might want to explain to our interviewer how the following flows work:

  1. How a Twitter user registers or logs in to their account
  2. How a Twitter user follows or unfollows another user
  3. How a Twitter user posts a tweet
  4. How a Twitter user gets their news feed

If the interviewer explicitly asks us to design one of the functionalities, we should omit the rest in the diagram and only focus on the service of interest.

We should be mindful not to dive into scaling topics such as database sharding, replications, and caching yet.

We should leave those to the scaling section.

Step 3: Deep-dive

Time estimate: 10-15 minutes

Once you have a high-level diagram, it’s time to examine system components and relationships in more detail.

The interviewer may prompt you to focus on a particular area, but don’t rely on them to drive the conversation. Check in regularly with your interviewer to see if they have questions or concerns in a specific area.

How do your non-functional requirements impact your design?

Consider how non-functional requirements impact design choices.

For example, if our system requires transactions, consider using a database that provides the ACID property.

If an online system requires fresh data, think about how to speed up the data ingestion, processing, and query process.

If the data size fits into memory (up to hundreds of GBs), consider putting the data into memory. However, RAM is prone to data loss, so if we can’t afford to lose data, we must find a way to make it persistent.

If the amount of data we need to store is large, we might want to partition the database to balance storage and query traffic.

Remember to revisit the data access pattern, QPS number, and read/write ratio discussed in Step 1 and consider how they impact our choices for different databases, database schemas, and indexing options.

We might need to add some load balancer layers to distribute the read/write traffic.

Keep in mind that we are expected to present different design choices along with their pros and cons, and explain why we prefer one approach over the other.

Remember that the system design question usually has no unique “correct” answer.

Therefore, weighing the trade-offs between different choices to satisfy our system’s functional and non-functional requirements is considered one of the most critical skill sets in a system design interview.

Step 4: Identify bottlenecks and scale

Time estimate: 10-15 minutes

After completing a deep dive into the system components, it’s time to zoom out and consider if the system can operate under various conditions and has room to support further growth.

Some important topics to consider during this step include:

  • Is there a single point of failure? If so, what can we do to improve the robustness and enhance the system’s availability?
  • Is the data valuable enough to require replication? If we replicate our data, how important is it to keep all versions consistent?
  • Do we support a global service? If so, do we need to deploy multi-geo data centers to improve data locality?
  • Are there any edge cases, such as peak time usage or hot users, that create a particular usage pattern that could deteriorate performance or even break the system?
  • How do we scale the system to support 10 times more users? As we scale the system, we may want to gradually upgrade each component or migrate to another architecture.

Concepts such as horizontal sharding, CDN (content delivery network), caching, rate-limiting, and SQL/NoSQL databases should be considered in the follow-up lessons.

Step 5: Review requirements, justify decisions, suggest alternatives, and answer questions

Time estimate: 5-10 minutes

This step will likely take you to the end of the interview. Throughout the discussion, it is good practice to refer back to requirements periodically.

If you have not done so already, now is the time to summarize. Walk through your major decisions, providing justification for each and discussing any tradeoffs in terms of space, time, and complexity.