System Design Interview Questions 2026 | System Design Interview Questions and Answers

Full Transcript

https://www.youtube.com/watch?v=PTJOECahJTc

[00:12] Hi everyone, this is Arvin from Mind Magics.
[00:14] Hi everyone, this is Arvin from Mind Magics and today I welcome you all to this amazing video on system design interview questions.
[00:19] So guys, uh if you're appearing for uh software development jobs mainly the SD roles.
[00:27] So system design is you know one such topic around which you'll be asked a lot of questions in the actual interviews.
[00:33] Okay. So it is very important to have conceptual understanding of this topic and if you want to do well if you want to clear the interview.
[00:44] So this is one such thing that you cannot ignore. Okay.
[00:49] So that is why we decided to come up with this topic and that is the agenda for today's video.
[00:55] And like we have seen in the previous videos as well.
[00:57] In this video we will be covering roughly around 30 questions and these 30 questions are divided into three categories.
[01:03] The first category is the beginner friendly or the one that is meant for freshers.
[01:07] Then we have the second category which is meant for the intermediate people like those people who have at least two to three years of experience.
[01:12] And finally we have
[01:14] years of experience.
[01:16] And finally we have the advanced category, the advanced category in which we cover the questions that are meant for the experienced candidates like those people who have at least four to five years of experience into the software development domain.
[01:27] Okay. So that is the agenda for today's video and before moving ahead I want you guys to subscribe to Mind Magic's YouTube channel and also hit that bell icon so that you never miss an update from us.
[01:39] So without any further delay, let's get started with our first question.
[01:43] Okay. So guys, the first question over here is an obvious question. What is system design?
[01:51] So system design is the process of defining how different components of a system like databases, servers and APIs interact to achieve a specific goal efficiently and reliably.
[02:02] So it's not just about writing code but you know it's about making highle architectural decisions that ensure scalability, performance and fault tolerance.
[02:11] Just for instance think of it like designing a city.
[02:13] So what are the
[02:15] designing a city.
[02:17] So what are the important components of designing a city?
[02:19] So in cities you have roads, you have traffic signals, you have power lines and all of these must work together even as the population grows.
[02:25] So in software terms system design focuses on how data flows through the system.
[02:30] how requests are processed and how failures are handled.
[02:32] So this was a quick definition of system design.
[02:35] And to elaborate this answer further, you can give an example.
[02:38] For example, designing an e-commerce platform involves planning, user management, product cataloges, you have payment gateways and you have delivery tracking as well.
[02:41] So delivery tracking is also another important component of an e-commerce platform.
[02:43] So each of these parts must integrate seamlessly while handling the potentially millions of requests.
[02:44] So in short, system design is about creating blueprints that balance scalability, reliability and simplicity.
[02:46] So apart from the definition, it is you know equally important to give a realtime example.
[02:48] So let's move to the next
[03:17] example. So let's move to the next question.
[03:19] So what are the key components of a system design?
[03:23] So every welldesigned system includes several essential components and those components are like you have clients, you have servers, you have databases, load balancer, caches, message cues and CDNs.
[03:35] CDNs or content delivery networks.
[03:38] So clients represent users like it can be either web browser or mobile apps.
[03:43] And the clients mainly what do they do?
[03:46] They send the requests.
[03:48] Next you have the servers and what are those servers doing?
[03:50] So they process these requests while databases store and retrieve the data.
[03:54] Then you have load balances that distribute traffic in evenly among the multiple servers to prevent the overloads and caching layers like radius or memach reduce the database strain by temporarily storing frequently accessed data.
[04:09] The CDNs on the other hand deliver static content such as images or videos faster by caching them geographically
[04:18] faster by caching them geographically closer to the users.
[04:20] closer to the users.
[04:22] Next, you also have message cues like CFKA or Rammit MQ
[04:24] decoupled services and enabling asynchronous communication.
[04:26] For example, in an online shopping site, when user places an order, a message Q ensures that payment, inventory, and notification services can handle this request independently.
[04:29] So, understanding how all of these pieces fit together helps you design systems that are fast, reliable, and easily scalable.
[04:47] The next question is what is the difference between vertical and horizontal scaling?
[04:52] So vertical scaling basically means increasing the resources.
[04:55] Now what are these resources?
[04:57] It can be CPU, RAM or storage.
[04:59] And you have to increase the resources of a single machine to handle more load.
[05:01] So for example, it's like upgrading the car's engine to making it faster.
[05:06] But here you have a limit of how much power you can add.
[05:09] Now if you talk about the horizontal scaling, it means adding more machines or servers to distribute
[05:19] more machines or servers to distribute the workload and this is similar to adding more cars to a fleet.
[05:25] So vertical scaling is simpler to implement since it doesn't require code changes but it becomes expensive and limited at scale.
[05:34] On the other hand, horizontal scaling although it's a bit complex, it offers better fall tolerance and elasticity.
[05:40] For example, in case of Netflix or AWS, they use the horizontal scaling so that if one server goes down, the others can handle the load seamlessly.
[05:53] So, modern architectures, especially those that are based on cloud systems, almost always prefer horizontal scaling because it allows applications to serve millions of users reliably.
[06:05] Okay, so let's move on to the next question.
[06:07] So the next question says what is a load balancer and why is it used?
[06:13] So a load balancer is a critical component that distributes the incoming network traffic across multiple servers.
[06:20] network traffic across multiple servers and this is done in order to ensure that.
[06:22] and this is done in order to ensure that no single server becomes a bottleneck.
[06:25] no single server becomes a bottleneck.
[06:26] So basically it improves both availability and performance by balancing the requests dynamically.
[06:31] Just to give you an example, imagine there's a restaurant where several waiters serve the customers.
[06:37] So if all the customers go to one waiter so what will happen eventually service will slow down.
[06:45] So in a similar context a load balancer ensures that even distribution of request is done dynamically.
[06:51] So load balancers can work at different levels.
[06:53] So if you talk about the OSI layer so load balancers can work at the layer 4 which is a transport layer and the layer 7 which is the application layer.
[07:03] So here tools like engineext, haroxy or cloud-based solutions like AWS elastic load balancer are commonly used.
[07:11] So apart from evenly distributing the traffic among different servers, they also perform the health checks routing traffic only to healthy servers and rerouting when one of the servers fails.
[07:22] rerouting when one of the servers fails.
[07:24] So in large scale systems like Google search or YouTube, load balancers are the reason users rarely experience downtime even during peak usage.
[07:32] The next question is what is caching and why is it important?
[07:36] So caching is a technique of storing frequently accessed data in a temporary storage layer.
[07:41] So this is usually in memory and why this is done.
[07:43] So this is done so that the future requests can be served faster.
[07:48] So here what we are trying to achieve is we are trying to save the time.
[07:53] So the goal is to reduce the latency and minimize the repeated access to the database.
[08:00] So think of caching like saving your favorite playlist offline.
[08:05] So you don't have to stream it every time.
[08:07] You just have to play the saved version.
[08:10] So there are common caching systems such as radius meme cache and oneish.
[08:17] So caching can happen at multiple levels.
[08:19] So you have browser cache which is at the client side.
[08:20] You have the CDN cache
[08:22] client side.
[08:24] You have the CDN cache which is at the network age and the application cache which is at the server side.
[08:26] So for example, an e-commerce website can cache the product details or images that don't change often.
[08:35] So when thousands of users view the same product, the system serves cached data instead of querying the database each time.
[08:41] And like I said, this saves the time.
[08:46] However, caching requires careful invalidation strategies like stale data can lead to inconsistencies.
[08:53] So the golden rule over here is cache what is expensive to compute or fetch but cheap to store.
[09:00] The next question is what is content delivery network or CDN.
[09:02] So a CDN is a network of distributed servers that delivers static or dynamic web content such as images, videos, CSS or JavaScript to users from the servers closest to them geographically.
[09:18] So the main goal over here is to reduce the latency and improve the loading speed.
[09:22] So without a CDN every request
[09:25] speed.
[09:27] So without a CDN every request travels to the origin server which can travels to the origin server which can cause delays for users that are far away.
[09:29] So popular CDNs include Cloudflare, Akami and Amazon CloudFront.
[09:32] away.
[09:36] Cloudflare, Akami and Amazon CloudFront.
[09:37] For example, when you watch a YouTube video in India, it is streamed from a nearby CDN server instead of one in the US.
[09:41] video in India, it is streamed from a nearby CDN server instead of one in the US.
[09:43] nearby CDN server instead of one in the US.
[09:45] US.
[09:47] So this local caching drastically improves the performance and reduces the bandwidth costs.
[09:49] improves the performance and reduces the bandwidth costs.
[09:53] Apart from that, CDNs also improve reliability through redundancy.
[09:54] also improve reliability through redundancy.
[09:58] So if one edge server fails, another nearby server serves the content.
[10:00] another nearby server serves the content.
[10:03] In global scale systems like Netflix, CDNs are essential to ensure the smooth playback even during high traffic peaks.
[10:06] Netflix, CDNs are essential to ensure the smooth playback even during high traffic peaks.
[10:08] the smooth playback even during high traffic peaks.
[10:11] The next question is what is the difference between the monolithic and a microservices architecture?
[10:13] is the difference between the monolithic and a microservices architecture?
[10:16] and a microservices architecture?
[10:18] So monolithic architecture combines all of the functionalities such as UI, business logic and database into a single deployable unit.
[10:20] of the functionalities such as UI, business logic and database into a single deployable unit.
[10:22] business logic and database into a single deployable unit.
[10:24] So it is simple
[10:25] single deployable unit.
[10:28] So it is simple to build initially but it becomes hard to scale and maintain as the system grows.
[10:32] Now if you talk about the microservices architecture here we break down the system into independent smaller services that communicate via APIs.
[10:40] So each service can be developed deployed and scaled independently.
[10:45] For example in Amazon the payment orders and recommendation systems are separate microservices.
[10:53] Microservices allows team to work in parallel and adopt different technologies for each service.
[11:01] However, they introduce challenges like service discovery, interervice communication and data consistency.
[11:08] So many modern systems start monolithic and transition to microservices as the user demand and complexity grows.
[11:13] The next question is what is database sharding?
[11:19] So sharding is a technique of splitting a large database into smaller, faster and more manageable parts which are
[11:26] and more manageable parts which are known as shards.
[11:29] So each shard stores a subset of the total data often based on a key like the user ID or the region.
[11:35] For example, in a social media platform, users in Asia can be stored in one shard and users in Europe in another shard.
[11:44] So this is done in order to reduce the load on a single database and improve the performance.
[11:51] Now what is the importance of sharding?
[11:53] So sharding is essential when the data volume or the query load exceeds what a single server can handle.
[12:00] However, it introduces the challenges like cross shard queries, database rebalancing and operational complexity as well.
[12:08] So systems like Facebook or Meta and Twitter rely heavily on sharding to serve billions of users efficiently.
[12:16] The next question is what is data replication?
[12:18] So data replication means keeping copies of the same data across multiple servers to improve the availability, fall tolerance and
[12:27] availability, fall tolerance and performance.
[12:29] So it ensures that even if one server fails, data remains accessible from replicas.
[12:34] So replica can be synchronous or asynchronous.
[12:38] So in synchronous immediate update across copies is done and in case of an ersynchronous updates propagate later.
[12:46] For example, Amazon's Dynamob uses multi-reion replication to maintain low latency and high resilience.
[12:54] Replication also helps distribute read traffic.
[12:59] Read heavy systems like news feeds or dashboards benefit greatly by this.
[13:01] So the trade-off lies in ensuring data consistency.
[13:07] Managing replication lag is a key challenge over here.
[13:10] So in summary, replication increases reliability but requires careful design to avoid conflicts and stale reads.
[13:18] The next question is a scenario based question.
[13:22] So let's suppose the interviewer here asks you to design a URL shortener like the bitly.
[13:25] So guys
[13:28] URL shortener like the bitly.
[13:30] So guys you must have been or I don't know how you must have been or I don't know how many of you might have used this many of you might have used this platform called bitly.
[13:35] So if you have a very huge URL which is relatively longer and if you want to shorten it so over there you just have to copy paste the URL and uh just by the click of one button your URL will be shortened.
[13:46] So that is basically the bitly.
[13:48] So in the actual interviews you'll be asked a lot of questions like how can you design or you know give us a rough idea of the steps that you would follow in order to design this and this system.
[13:59] So over here we are talking about the URL shortener like bitly.
[14:06] So to design a URL shortener app or the website you can start with a front end for user input and a backend service that generates a short unique key for each long URL.
[14:19] So this key can be created using base 62 encoding.
[14:22] So you using characters like small alphabets A to Z uh capital alphabets A to Z and the
[14:28] to Z uh capital alphabets A to Z and the numbers which is 0 to 9.
[14:31] So the backend numbers which is 0 to 9.
[14:33] So the backend stores the mapping between the short and stores the mapping between the short and the long URLs in a database.
[14:35] So for the long URLs in a database.
[14:37] So for quick lookups you can use radius for quick lookups you can use radius for caching and a NoSQL database like Dynamo.
[14:40] caching and a NoSQL database like Dynamo DB or Cassandra for persistent storage.
[14:45] DB or Cassandra for persistent storage.
[14:47] Apart from that you can add engineext or Apart from that you can add engineext or haroxy for load balancing across.
[14:49] haroxy for load balancing across servers.
[14:51] servers. Apart from that, you'll also servers. Apart from that, you'll also need analytics services to track the.
[14:53] need analytics services to track the link usage and click counts.
[14:56] So to link usage and click counts.
[14:58] So to handle billions of URLs, what you can do So to handle billions of URLs, what you can do is you can partition data and use.
[15:00] is you can partition data and use hashing algorithm to distribute the hashing algorithm to distribute the records evenly.
[15:02] So well designed URL records evenly.
[15:05] So well designed URL shortener should guarantee short unique So well designed URL shortener should guarantee short unique links, fast redirection and globally.
[15:07] short unique links, fast redirection and globally scalability just like the bitly or the.
[15:11] scalability just like the bitly or the tiny URL does.
[15:12] tiny URL does.
[15:17] So guys with this we have covered the So guys with this we have covered the fresher or the beginner level of.
[15:18] fresher or the beginner level of questions for system design and now the.
[15:20] questions for system design and now the next 10 questions will be the.
[15:22] next 10 questions will be the intermediate level of questions for.
[15:24] intermediate level of questions for system design.
[15:26] The next question is what
[15:29] System design.
[15:29] The next question is what is a message Q and why is it used?
[15:32] So a message Q is a component that enables a synchronous communication between different services in a distributed system.
[15:36] So instead of one service waiting for the another to finish the processing, it sends a message to a queue and continues its work.
[15:47] So another service then reads and processes that message whenever it is ready.
[15:52] So this approach improves the performance, scalability and fault tolerance.
[15:58] For example, in an e-commerce platform, after one user places an order, the order service can send a message to the inventory service via a queue without waiting for the confirmation.
[16:11] Over here the tools like Cafka, Rabbit, MQ or the AWS SQS are popular for implementing the cues.
[16:17] Message cues help decouple the services.
[16:19] So for example, if one service fails, messages remain in the queue until it recovers.
[16:26] So they also smooth out a traffic spikes since messages can be processed at steady pace.
[16:29] So by
[16:32] be processed at steady pace.
[16:34] So by designing synchronous workflows using cues what you achieve is you achieve systems that remain responsive and resilient even under heavy load.
[16:43] The next question is explain the cap theorem in system design.
[16:48] The cap theorem describes the trade-offs that distributed systems must make among three properties which is consistency, availability and partition tolerance.
[16:59] So if you talk about consistency, it basically means every user sees the same data at the same time.
[17:08] Next comes the availability.
[17:10] So availability means that the system continues to respond even if some nodes fail.
[17:16] And now if you talk about partition tolerance, so it simply means the system continues working even if the communication between the parts of the system is lost.
[17:25] So the theorem states that in the presence of a network partition, you can only guarantee two out of three properties at once.
[17:31] For example, MongoDB and Cassandra
[17:33] example, MongoDB and Cassandra prioritize availability while Zookeeper.
[17:36] prioritize availability while Zookeeper focuses on consistency.
[17:38] focuses on consistency.
[17:40] So let's consider a realtime example over here.
[17:42] So in a banking application, consistency is crucial.
[17:45] consistency is crucial.
[17:48] For example, account balances must always be accurate.
[17:50] But in social media app slight delays are acceptable
[17:53] are acceptable if it means that the system stays available during heavy traffic as well.
[17:57] if it means that the system stays available during heavy traffic as well.
[18:00] So system designers must choose the trade-offs based on their business requirements.
[18:02] So system designers must choose the trade-offs based on their business requirements.
[18:04] The next question is what is database indexing and how does it improve the performance?
[18:07] requirements. The next question is what is database indexing and how does it improve the performance?
[18:09] So database indexing is like creating a shortcut that helps the database quickly locate the data without scanning every record in the table.
[18:11] So database indexing is like creating a shortcut that helps the database quickly locate the data without scanning every record in the table.
[18:13] indexing is like creating a shortcut that helps the database quickly locate the data without scanning every record in the table.
[18:15] that helps the database quickly locate the data without scanning every record in the table.
[18:17] the data without scanning every record in the table.
[18:21] So it uses special data structures like B trees or hashmaps to speed up the searches.
[18:23] structures like B trees or hashmaps to speed up the searches.
[18:26] For example, if you frequently query users by email address.
[18:28] So creating an index on the email column makes lookups significantly
[18:31] address. So creating an index on the email column makes lookups significantly
[18:34] email column makes lookups significantly faster.
[18:39] However, indexes come at a cost.
[18:39] faster.
[18:39] However, indexes come at a cost.
[18:41] They use the extra disk space and slow
[18:41] down the right operation like the insert
[18:43] or update since the index must also be
[18:45] updated.
[18:47] So in the real world systems such as the
[18:49] Instagram indexes are usually based on
[18:52] fields like the user ID, post, time
[18:54] stamp or hashtags to make feed
[18:57] generation faster.
[18:57] So the key over here
[19:00] is balance.
[19:00] So index only where
[19:02] performant gains outweigh the
[19:04] maintenance cost.
[19:04] So effective indexing
[19:07] can make the difference between query
[19:09] taking milliseconds or minutes and which
[19:12] is critical in system handling millions
[19:14] of records.
[19:14] The next question is how do
[19:16] you handle system failures in
[19:18] distributed environments?
[19:20] So in large distributed systems failures
[19:23] are inevitable.
[19:23] So the goal isn't to
[19:25] prevent them entirely but to design a
[19:27] system for resilience.
[19:27] So this involves
[19:30] using redundancy replication failover
[19:33] mechanisms and continuous monitoring.
[19:36] mechanisms and continuous monitoring.
[19:38] For example, you can deploy multiple database replicas across regions.
[19:42] If one region goes down, another immediately takes over.
[19:43] So load balancers can detect failed servers through health checks and redirect the traffic automatically.
[19:51] So over here the tools like Prometheus, Graphana or Data Dog help monitor metrics and trigger the alerts for unusual behavior.
[19:57] Additionally, retry mechanisms and circuit breakers prevent the cascading failures.
[20:02] If a downstream service is unavailable, requests are paused and rerouted.
[20:08] The next question is here you have another scenario question.
[20:10] So the interviewer might ask you design a chat application like WhatsApp.
[20:17] So guys here you know basically what the interviewer wants to know is whether you have really hands on experience in working with systems or designing the systems like WhatsApp or you know any other chat application.
[20:30] So here you know to the best of your knowledge you can describe you can give an
[20:37] you can describe you can give an overview of how the WhatsApp system
[20:40] overview of how the WhatsApp system looks like what are the important com
[20:42] looks like what are the important com components and how do they communicate
[20:44] components and how do they communicate so on and so forth. So let's see how to
[20:47] so on and so forth. So let's see how to answer this question. So to design a
[20:50] answer this question. So to design a chat system what you can do is you can
[20:51] chat system what you can do is you can start with the main components which is
[20:53] start with the main components which is the user service message service chat
[20:55] the user service message service chat storage and notification service. So
[20:58] storage and notification service. So these are the important services in the
[21:00] these are the important services in the chat application. So the system must
[21:02] chat application. So the system must support real-time communication, message
[21:04] support real-time communication, message persistence and offline delivery. So
[21:08] persistence and offline delivery. So what you can do is you can use web
[21:10] what you can do is you can use web sockets for live birectional
[21:12] sockets for live birectional communication between users and servers.
[21:14] communication between users and servers. So this ensures that the messages appear
[21:16] So this ensures that the messages appear instantly. So when a message is sent, it
[21:19] instantly. So when a message is sent, it should be stored in a NoSQL database
[21:21] should be stored in a NoSQL database like DynamoB or Cassandra for
[21:23] like DynamoB or Cassandra for scalability and high write throughput.
[21:26] scalability and high write throughput. For high reliability, you can use CFKA
[21:28] For high reliability, you can use CFKA or Rabbit MQ to cue the messages
[21:30] or Rabbit MQ to cue the messages temporarily in case of server failure.
[21:35] Each message should have a status such
[21:37] Each message should have a status such as sent, delivered, read and these
[21:40] as sent, delivered, read and these updates should be pushed in real time
[21:41] updates should be pushed in real time using the websocket events. So for large
[21:44] using the websocket events. So for large use, I repeat, so for large user bases,
[21:48] use, I repeat, so for large user bases, partition the messages by user ID or
[21:50] partition the messages by user ID or chat room to distribute the load. So
[21:53] chat room to distribute the load. So notifications can be sent via Firebase
[21:55] notifications can be sent via Firebase cloud messaging when users are offline.
[21:59] cloud messaging when users are offline. So this architecture ensures that the
[22:01] So this architecture ensures that the billions of messages per day can be
[22:02] billions of messages per day can be exchanged reliably just like WhatsApp or
[22:05] exchanged reliably just like WhatsApp or any other chat application like
[22:07] any other chat application like Telegram. So guys let's move to the next
[22:09] Telegram. So guys let's move to the next question. So the next question is
[22:11] question. So the next question is another scenario based question. So the
[22:13] another scenario based question. So the interviewer might ask you to design a
[22:15] interviewer might ask you to design a news feed system like the Facebook or
[22:17] news feed system like the Facebook or the Meta.
[22:19] the Meta. Okay. So let's try to answer this
[22:21] Okay. So let's try to answer this question as well. So designing a news
[22:24] question as well. So designing a news feed system requires handling large
[22:26] feed system requires handling large volumes of content and delivering
[22:28] volumes of content and delivering personalized results quickly. So the
[22:31] personalized results quickly. So the main components here are the user
[22:33] main components here are the user service, post service, feed generation
[22:35] service, post service, feed generation service and ranking service.
[22:38] service and ranking service. So whenever a user posts something that
[22:41] So whenever a user posts something that post is stored in a database and pushed
[22:43] post is stored in a database and pushed to their followers feeds using a fan out
[22:47] to their followers feeds using a fan out on write approach. Alternatively, feeds
[22:50] on write approach. Alternatively, feeds can be generated on demand using the fan
[22:52] can be generated on demand using the fan out on read depending on the scale. So
[22:56] out on read depending on the scale. So for fast retrieval use reddish to cach
[22:59] for fast retrieval use reddish to cach the most recent posts since the user
[23:02] the most recent posts since the user often care about more freshness than
[23:04] often care about more freshness than completeness.
[23:06] completeness. The ranking services uses algorithm
[23:09] The ranking services uses algorithm based on engagement, relevance and
[23:11] based on engagement, relevance and recency to prioritize the posts. So
[23:14] recency to prioritize the posts. So systems like Meta or Facebook and
[23:16] systems like Meta or Facebook and LinkedIn use graph databases to manage
[23:19] LinkedIn use graph databases to manage the connections efficiently. So handling
[23:22] the connections efficiently. So handling millions of feed updates daily requires
[23:24] millions of feed updates daily requires careful optimization. So here you
[23:27] careful optimization. So here you combine the precomputed results, caching
[23:29] combine the precomputed results, caching layers and background jobs as well. So
[23:32] layers and background jobs as well. So the main challenge over here lies in
[23:34] the main challenge over here lies in balancing the freshness, personalization
[23:37] balancing the freshness, personalization and scalability. The next question is
[23:39] and scalability. The next question is what is rate limiting and why is it
[23:41] what is rate limiting and why is it important? So rate limiting is a
[23:44] important? So rate limiting is a technique used to control how many
[23:46] technique used to control how many requests a user or a system can make
[23:48] requests a user or a system can make within a specific time frame. So rate
[23:53] within a specific time frame. So rate limit prevents abuse, denial of service
[23:55] limit prevents abuse, denial of service attacks or overloading of backend
[23:59] attacks or overloading of backend system. For example, Twitter limits on
[24:01] system. For example, Twitter limits on how many times you can hit its APIs per
[24:04] how many times you can hit its APIs per minute to ensure the stability.
[24:07] minute to ensure the stability. The most common algorithm for rate
[24:09] The most common algorithm for rate limiting are token bucket, leaky bucket
[24:11] limiting are token bucket, leaky bucket and fixed window counters.
[24:14] and fixed window counters. So rate limits can be enforced at the
[24:16] So rate limits can be enforced at the API gateway, load balancer or even at
[24:19] API gateway, load balancer or even at the application layer. So rate limits
[24:23] the application layer. So rate limits have realtime examples as well. So for
[24:25] have realtime examples as well. So for example, if a user
[24:28] example, if a user sends too many login attempts, rate
[24:30] sends too many login attempts, rate limiting prevents the brute force
[24:32] limiting prevents the brute force attacks. So it's also important for
[24:34] attacks. So it's also important for multi-tenant systems ensuring that no
[24:36] multi-tenant systems ensuring that no single client monopolizes the resources.
[24:39] single client monopolizes the resources. So by maintaining the fair resource
[24:41] So by maintaining the fair resource allocation and consistent performance
[24:43] allocation and consistent performance rate limiting keeps the systems stable
[24:45] rate limiting keeps the systems stable under pressure.
[24:47] under pressure. The next question is how would you
[24:50] The next question is how would you design a file storage system like the
[24:52] design a file storage system like the Google drive. So designing a file
[24:55] Google drive. So designing a file storage system involves multiple layers
[24:57] storage system involves multiple layers such as here you have the upload
[24:59] such as here you have the upload service, storage service, metadata
[25:01] service, storage service, metadata management and synchronization service.
[25:04] management and synchronization service. So when a user uploads a file, it is
[25:07] So when a user uploads a file, it is split into chunks, let's say 4 MB each
[25:10] split into chunks, let's say 4 MB each and stored across distributed storage
[25:12] and stored across distributed storage nodes often using object storage systems
[25:15] nodes often using object storage systems like the Amazon S3. So each file and
[25:18] like the Amazon S3. So each file and chunks metadata like the name, location
[25:20] chunks metadata like the name, location and version is stored in a metadata
[25:22] and version is stored in a metadata database which can be either posgress
[25:24] database which can be either posgress SQL or MongoDB. So this allows a quick
[25:27] SQL or MongoDB. So this allows a quick lookup and version management
[25:30] lookup and version management for large files. Parallel upload and
[25:32] for large files. Parallel upload and chunk level dduplication improves the
[25:34] chunk level dduplication improves the efficiency. Here the users can access
[25:37] efficiency. Here the users can access the files via CDN to reduce the latency
[25:41] the files via CDN to reduce the latency to ensure the synchronization across
[25:43] to ensure the synchronization across devices. Clients maintain a local cache
[25:45] devices. Clients maintain a local cache and use change detection mechanism to
[25:48] and use change detection mechanism to synchronize only modified chunks. So
[25:51] synchronize only modified chunks. So access control and permissions are
[25:52] access control and permissions are handled through a user authentication
[25:54] handled through a user authentication service like OOTH. The system must
[25:57] service like OOTH. The system must support versioning, sharing and
[25:59] support versioning, sharing and collaboration seamlessly just like
[26:02] collaboration seamlessly just like Google Drive or Dropbox which can manage
[26:04] Google Drive or Dropbox which can manage billions of files daily. The next
[26:07] billions of files daily. The next question is what is eventual consistency
[26:09] question is what is eventual consistency and when is it acceptable? So eventual
[26:12] and when is it acceptable? So eventual consistency is a consistency model used
[26:15] consistency is a consistency model used in a distributed databases where updates
[26:18] in a distributed databases where updates don't happen instantly across all nodes
[26:20] don't happen instantly across all nodes but eventually all the replicas converge
[26:23] but eventually all the replicas converge to the same state. So it is suitable
[26:25] to the same state. So it is suitable when availability and performance are
[26:27] when availability and performance are more critical than immediate accuracy.
[26:30] more critical than immediate accuracy. For example, in social media
[26:31] For example, in social media applications, it is very much acceptable
[26:33] applications, it is very much acceptable if a new post appears to one friend a
[26:36] if a new post appears to one friend a few seconds later and this doesn't break
[26:38] few seconds later and this doesn't break the functionality. So here the systems
[26:41] the functionality. So here the systems like Amazon Dynamob, Cassandra and
[26:43] like Amazon Dynamob, Cassandra and Couchbase use eventual consistency to
[26:46] Couchbase use eventual consistency to handle the global scale data
[26:48] handle the global scale data replication.
[26:50] replication. However, it is not suitable for systems
[26:52] However, it is not suitable for systems like banking or stock trading where
[26:55] like banking or stock trading where realtime accuracy is non-negotiable.
[26:58] realtime accuracy is non-negotiable. So designers often combine eventual
[27:01] So designers often combine eventual consistency with read repair and
[27:03] consistency with read repair and background synchronization to ensure the
[27:05] background synchronization to ensure the data integrity. So this trade-off helps
[27:08] data integrity. So this trade-off helps achieve massive scalability without
[27:10] achieve massive scalability without sacrificing the user experience
[27:12] sacrificing the user experience experience. I repeat, this trade-off
[27:15] experience. I repeat, this trade-off helps achieve massive scalability
[27:17] helps achieve massive scalability without sacrificing user experience
[27:19] without sacrificing user experience where minor delays are tolerable. So the
[27:23] where minor delays are tolerable. So the next question is another scenario based
[27:24] next question is another scenario based question. So the question here is design
[27:28] question. So the question here is design an online ticket booking system. So
[27:30] an online ticket booking system. So guys, an online ticket booking system,
[27:32] guys, an online ticket booking system, you know, it can be anything such as a
[27:33] you know, it can be anything such as a red bus which is uh meant for the uh bus
[27:37] red bus which is uh meant for the uh bus travels. It can even be IRCTC
[27:42] travels. It can even be IRCTC for train and you know flights uh and
[27:45] for train and you know flights uh and any other uh popular ticket booking
[27:47] any other uh popular ticket booking systems. Okay. So to answer this
[27:50] systems. Okay. So to answer this question, what you can do is an online
[27:52] question, what you can do is an online ticket booking system must ensure
[27:54] ticket booking system must ensure realtime seat availability, transaction
[27:56] realtime seat availability, transaction safety and no double booking. So over
[27:59] safety and no double booking. So over here what you can do is you can start
[28:00] here what you can do is you can start with a front end for search and booking
[28:03] with a front end for search and booking a back end with booking and payment
[28:05] a back end with booking and payment services and a database for seat
[28:08] services and a database for seat inventory. Over here you can use asset
[28:11] inventory. Over here you can use asset compliant database for handling the
[28:12] compliant database for handling the transactions to ensure the integrity. So
[28:15] transactions to ensure the integrity. So when a user selects a seat you can mark
[28:17] when a user selects a seat you can mark it as temporarily reserved using locks
[28:21] it as temporarily reserved using locks or optimistic concurrency control until
[28:23] or optimistic concurrency control until the payment is completed. So if the
[28:26] the payment is completed. So if the payment fails or times out, release the
[28:28] payment fails or times out, release the seat automatically. Over here, use the
[28:31] seat automatically. Over here, use the message cues for asynchronous operations
[28:33] message cues for asynchronous operations like sending confirmation emails. For
[28:36] like sending confirmation emails. For scalability, replicate the data across
[28:38] scalability, replicate the data across regions and using caching such as radius
[28:41] regions and using caching such as radius to display the available seats quickly.
[28:44] to display the available seats quickly. So since traffic spikes during popular
[28:46] So since traffic spikes during popular events, employ rate limiting and load
[28:48] events, employ rate limiting and load balancing to prevent the overload. So in
[28:51] balancing to prevent the overload. So in essence, this system needs both
[28:52] essence, this system needs both precision and speed. So one mistake
[28:55] precision and speed. So one mistake could lead to over booked flight or sold
[28:57] could lead to over booked flight or sold out concerts being oversold. So guys
[29:00] out concerts being oversold. So guys with this we have covered the
[29:01] with this we have covered the intermediate level of questions for
[29:02] intermediate level of questions for system design and now the next 10
[29:05] system design and now the next 10 questions like 21 to 30 we will be
[29:07] questions like 21 to 30 we will be covering the questions that are meant
[29:09] covering the questions that are meant for the advanced people. So the next
[29:11] for the advanced people. So the next question is how would you design a
[29:13] question is how would you design a scalable chat application like Slack? So
[29:17] scalable chat application like Slack? So guys in the previous questions we have
[29:19] guys in the previous questions we have already covered a similar question where
[29:21] already covered a similar question where you were asked to design a chat
[29:23] you were asked to design a chat application like WhatsApp or even you
[29:26] application like WhatsApp or even you know telegram. So this is also a similar
[29:29] know telegram. So this is also a similar question. Here you have scalable chat
[29:31] question. Here you have scalable chat application like slack.
[29:35] application like slack. So designing a scalable chat application
[29:37] So designing a scalable chat application involves handling millions of concurrent
[29:39] involves handling millions of concurrent users message delivery and realtime
[29:42] users message delivery and realtime synchronization. So guys like as we have
[29:45] synchronization. So guys like as we have seen in the previous questions
[29:48] seen in the previous questions like the one that covered WhatsApp. So
[29:50] like the one that covered WhatsApp. So here you have web- based connections
[29:53] here you have web- based connections like the web web soocket based
[29:55] like the web web soocket based connection is needed and you have
[29:57] connection is needed and you have message cues as well. You have database
[30:00] message cues as well. You have database and
[30:02] and you can use for caching you can use
[30:05] you can use for caching you can use reddish. Okay. And the important
[30:07] reddish. Okay. And the important criteria over here is you know
[30:09] criteria over here is you know reliability and you can use CDNs as
[30:12] reliability and you can use CDNs as well. Okay. So guys I'm leaving this
[30:15] well. Okay. So guys I'm leaving this question for you to practice. If you
[30:17] question for you to practice. If you have a detailed answer you can write
[30:19] have a detailed answer you can write them in the comment section and let's
[30:22] them in the comment section and let's see how many of you are able to answer
[30:24] see how many of you are able to answer this question successfully. Okay. So
[30:27] this question successfully. Okay. So let's move to the next question. So the
[30:28] let's move to the next question. So the next question says how would you design
[30:31] next question says how would you design a real-time collaboration tool like
[30:33] a real-time collaboration tool like Google Docs. So guys in the previous
[30:35] Google Docs. So guys in the previous questions as well we have seen how would
[30:37] questions as well we have seen how would you design a system like Google drive.
[30:39] you design a system like Google drive. Okay. So similar to that we have another
[30:41] Okay. So similar to that we have another realtime question or you know a
[30:44] realtime question or you know a practical question where you have to
[30:47] practical question where you have to design a tool like Google Docs. So let's
[30:50] design a tool like Google Docs. So let's see how do we answer this.
[30:54] see how do we answer this. So collaboration tool like Google Docs
[30:56] So collaboration tool like Google Docs needs realtime synchronization across
[30:59] needs realtime synchronization across users editing the same document. So the
[31:02] users editing the same document. So the core challenge over here is managing the
[31:04] core challenge over here is managing the concurrent edits without losing the
[31:06] concurrent edits without losing the data. So you can use operational
[31:08] data. So you can use operational transformation or OT or conflict-free
[31:11] transformation or OT or conflict-free replicated data types to ensure the
[31:13] replicated data types to ensure the consistent document states. So the
[31:16] consistent document states. So the architecture includes a collaboration
[31:17] architecture includes a collaboration server that maintains the master
[31:19] server that maintains the master document state and process updates from
[31:21] document state and process updates from the clients.
[31:23] the clients. So clients maintain local copies and
[31:25] So clients maintain local copies and send the changes via web sockets for low
[31:28] send the changes via web sockets for low latency.
[31:30] latency. So updates are broadcasted to all
[31:32] So updates are broadcasted to all connected clients to synchronize the
[31:34] connected clients to synchronize the changes in real time. The document is
[31:37] changes in real time. The document is periodically persisted to a distributed
[31:39] periodically persisted to a distributed storage system like Google Cloud Storage
[31:42] storage system like Google Cloud Storage or Dynamob. So for scalability you can
[31:45] or Dynamob. So for scalability you can sh documents by ID and maintain the user
[31:48] sh documents by ID and maintain the user sessions with radius. So load balances
[31:51] sessions with radius. So load balances distributor traffic among collaboration
[31:52] distributor traffic among collaboration servers and caching frequently access
[31:55] servers and caching frequently access document helps reduce the latency.
[31:58] document helps reduce the latency. Finally, a versioning mechanism allows a
[32:00] Finally, a versioning mechanism allows a roll back and a history tracking
[32:02] roll back and a history tracking ensuring the reliability even during
[32:04] ensuring the reliability even during conflicts.
[32:06] conflicts. The next question is how would you
[32:08] The next question is how would you design a YouTube or a large scale video
[32:10] design a YouTube or a large scale video streaming platform. So system like
[32:13] streaming platform. So system like YouTube involves video upload, storage,
[32:16] YouTube involves video upload, storage, transcoding, distribution and playback.
[32:19] transcoding, distribution and playback. So when a user uploads a video, it's
[32:21] So when a user uploads a video, it's first stored in a temporary object store
[32:23] first stored in a temporary object store like AWS S3. Then a transcoding service
[32:26] like AWS S3. Then a transcoding service converts it into multiple resolutions
[32:28] converts it into multiple resolutions and bit rates using formats like HLS or
[32:31] and bit rates using formats like HLS or DASH. The processed videos are then
[32:34] DASH. The processed videos are then distributed via CDN for low latency
[32:36] distributed via CDN for low latency playback. The metadata such as the
[32:38] playback. The metadata such as the titles, comments, likes are stored in
[32:41] titles, comments, likes are stored in SQL databases while video analytics such
[32:44] SQL databases while video analytics such as views, watch time go into a NoSQL or
[32:46] as views, watch time go into a NoSQL or analytics store like the BigQuery. So
[32:49] analytics store like the BigQuery. So load balancer distributes user requests
[32:52] load balancer distributes user requests to video servers geographically closest
[32:54] to video servers geographically closest to them. So which here caching is
[32:56] to them. So which here caching is crucial as well. So for that you can use
[32:59] crucial as well. So for that you can use edge cache to reduce the playback
[33:01] edge cache to reduce the playback buffering by storing popular videos
[33:03] buffering by storing popular videos close to the users and to handle the
[33:05] close to the users and to handle the billions of requests. You can employ
[33:07] billions of requests. You can employ microservices for each component such as
[33:09] microservices for each component such as upload, stream, recommend and use
[33:12] upload, stream, recommend and use message cues for asynchronous
[33:13] message cues for asynchronous processing.
[33:15] processing. Apart from that, machine learning models
[33:17] Apart from that, machine learning models recommend videos based on the user
[33:19] recommend videos based on the user behavior and watch history while
[33:21] behavior and watch history while monitoring ensures availability and
[33:24] monitoring ensures availability and fault tolerance. The next question is
[33:26] fault tolerance. The next question is how would you design a recommendation
[33:28] how would you design a recommendation system for Netflix or Amazon? So guys,
[33:31] system for Netflix or Amazon? So guys, if you have used these apps or you know
[33:34] if you have used these apps or you know these uh platforms such as Netflix or
[33:36] these uh platforms such as Netflix or Amazon based on the
[33:39] Amazon based on the your uh search history or the objects
[33:42] your uh search history or the objects that you have uh like for the movies
[33:44] that you have uh like for the movies that you have seen or you have searched
[33:46] that you have seen or you have searched on Netflix or the products that you
[33:48] on Netflix or the products that you search on Amazon based on your search
[33:50] search on Amazon based on your search history and based on your age and
[33:52] history and based on your age and several other factors you you get a
[33:54] several other factors you you get a recommendation system. Okay. So here the
[33:57] recommendation system. Okay. So here the question is how would you design a
[33:59] question is how would you design a recommendation system for Netflix or
[34:00] recommendation system for Netflix or Amazon? So recommendation system
[34:02] Amazon? So recommendation system analyzes the user behavior to predict
[34:05] analyzes the user behavior to predict what they'll are likely to enjoy the
[34:07] what they'll are likely to enjoy the next. So it uses a combination of
[34:10] next. So it uses a combination of collaborative filtering, content based
[34:12] collaborative filtering, content based filtering and contextual algorithms. So
[34:14] filtering and contextual algorithms. So the architecture consists of a data
[34:16] the architecture consists of a data pipeline that ingests user actions such
[34:19] pipeline that ingests user actions such as views, purchases or ratings via email
[34:22] as views, purchases or ratings via email streams like the CFKA. So this data is
[34:25] streams like the CFKA. So this data is then processed in real time by spark or
[34:27] then processed in real time by spark or flink and stored in data warehouse like
[34:29] flink and stored in data warehouse like the snowflake or red shift. So machine
[34:32] the snowflake or red shift. So machine learning models run periodically to
[34:34] learning models run periodically to generate recommendations which are
[34:35] generate recommendations which are stored in radius or elastic search for
[34:38] stored in radius or elastic search for quick retrieval. So whenever a user logs
[34:41] quick retrieval. So whenever a user logs in the application fetches the
[34:42] in the application fetches the personalized recommendation in
[34:44] personalized recommendation in milliseconds. The system also employs AB
[34:47] milliseconds. The system also employs AB testing to validate the model
[34:48] testing to validate the model performance. Over here the scalability
[34:50] performance. Over here the scalability is achieved by separating offline
[34:52] is achieved by separating offline training which is the batch processing
[34:54] training which is the batch processing and online serving the real-time
[34:56] and online serving the real-time predictions
[34:58] predictions for high traffic. Caching frequently
[35:00] for high traffic. Caching frequently accessed recommendation lists ensures
[35:03] accessed recommendation lists ensures low latency.
[35:05] low latency. So monitoring ensures models remain
[35:07] So monitoring ensures models remain accurate as user preferences evolve. The
[35:10] accurate as user preferences evolve. The next question is how would you design a
[35:12] next question is how would you design a distributed cache system for large scale
[35:15] distributed cache system for large scale application? So distributed cache speeds
[35:18] application? So distributed cache speeds up the data retrieval by reducing the
[35:20] up the data retrieval by reducing the database load. So the cache system such
[35:22] database load. So the cache system such as the radius cluster or the meme cache
[35:25] as the radius cluster or the meme cache stores frequently access data in memory
[35:27] stores frequently access data in memory across multiple nodes. So here you can
[35:30] across multiple nodes. So here you can partition the data using consistent
[35:32] partition the data using consistent hashing to ensure that the balanced
[35:34] hashing to ensure that the balanced distribution even when nodes join or
[35:36] distribution even when nodes join or leave. So replication over here ensures
[35:39] leave. So replication over here ensures the availability. So each cache shard
[35:42] the availability. So each cache shard can have replicas for failover. So cache
[35:45] can have replicas for failover. So cache invalidation policies such as least
[35:47] invalidation policies such as least recently used or LRU or TTL manage the
[35:51] recently used or LRU or TTL manage the freshness. So the system can support
[35:53] freshness. So the system can support right through write and writeback
[35:56] right through write and writeback strategies depending on the consistency
[35:58] strategies depending on the consistency requirements. So to avoid cash stamped
[36:01] requirements. So to avoid cash stamped employing
[36:04] employing and staggered TTLs for global
[36:07] and staggered TTLs for global applications regional cache clusters can
[36:09] applications regional cache clusters can reduce latency using georrelication.
[36:12] reduce latency using georrelication. Finally, monitoring cache hit ratio,
[36:15] Finally, monitoring cache hit ratio, latency, and eviction rates ensures the
[36:17] latency, and eviction rates ensures the optimal performance.
[36:20] optimal performance. So, well-designed distributed cache can
[36:22] So, well-designed distributed cache can handle millions of reads per second
[36:24] handle millions of reads per second while maintaining the low latency. The
[36:26] while maintaining the low latency. The next question is how would you design a
[36:28] next question is how would you design a distributed logging system like the elk
[36:30] distributed logging system like the elk stack which is the elastic search, log
[36:33] stack which is the elastic search, log stash and kibana. So, distributed
[36:35] stash and kibana. So, distributed logging system aggregates logs from
[36:38] logging system aggregates logs from multiple microservices and servers for
[36:40] multiple microservices and servers for centralized monitoring.
[36:42] centralized monitoring. Application sends logs via agents to the
[36:45] Application sends logs via agents to the log stash or CFKA which act as injection
[36:48] log stash or CFKA which act as injection pipelines. The data is then processed,
[36:50] pipelines. The data is then processed, filtered and stored in elastic search
[36:53] filtered and stored in elastic search where it is indexed for fast search.
[36:56] where it is indexed for fast search. Kibbana provides a UI for quering and
[36:58] Kibbana provides a UI for quering and visualizing logs. To scale, you can
[37:01] visualizing logs. To scale, you can shard and replicate elastic search
[37:03] shard and replicate elastic search indices across multiple nodes. A load
[37:07] indices across multiple nodes. A load balancer distributes the query traffic
[37:09] balancer distributes the query traffic evenly. So to handle spikes you can use
[37:12] evenly. So to handle spikes you can use buffering mechanisms between injection
[37:14] buffering mechanisms between injection and storage and retention policies and
[37:17] and storage and retention policies and cold storage can help you manage the
[37:19] cold storage can help you manage the storage costs as well. So security
[37:22] storage costs as well. So security measures like role- based access control
[37:24] measures like role- based access control and encryption protect sensitive logs.
[37:28] and encryption protect sensitive logs. With such a design engineers can debug
[37:31] With such a design engineers can debug distributed systems in real time and
[37:33] distributed systems in real time and identify the root causes quickly. The
[37:35] identify the root causes quickly. The next question is how would you design a
[37:38] next question is how would you design a payment processing system like the
[37:39] payment processing system like the PayPal or the stripe? So payment system
[37:42] PayPal or the stripe? So payment system must prioritize security reliability and
[37:45] must prioritize security reliability and indem potency. So when a user initiates
[37:48] indem potency. So when a user initiates a transaction, it goes through a payment
[37:51] a transaction, it goes through a payment gateway API which validates the inputs
[37:53] gateway API which validates the inputs and communicates with a payment
[37:55] and communicates with a payment processor. So the system ensures the PCI
[38:00] processor. So the system ensures the PCI DSS compliance, encrypts sensitive data
[38:02] DSS compliance, encrypts sensitive data and supports multicurrency operations.
[38:05] and supports multicurrency operations. Transactions are handled as state
[38:07] Transactions are handled as state machines with stages like initiated,
[38:10] machines with stages like initiated, authorized and captured. So each step is
[38:13] authorized and captured. So each step is in important to avoid a double charge.
[38:15] in important to avoid a double charge. So payment data is stored in relational
[38:17] So payment data is stored in relational database with strong consistency while
[38:20] database with strong consistency while Kafka cues handle a synchronized
[38:22] Kafka cues handle a synchronized workflows like notifications and
[38:24] workflows like notifications and settlements. To improve the reliability
[38:26] settlements. To improve the reliability over here you can use the circuit
[38:27] over here you can use the circuit breakers to handle third party gateway
[38:30] breakers to handle third party gateway failures. A fraud detection service runs
[38:33] failures. A fraud detection service runs realtime risk checks using the ML
[38:35] realtime risk checks using the ML models. And here the scalability is
[38:37] models. And here the scalability is achieved via microservices and
[38:38] achieved via microservices and distributed transaction coordination.
[38:41] distributed transaction coordination. For example, the saga pattern monitoring
[38:44] For example, the saga pattern monitoring and audit trials are critical for
[38:46] and audit trials are critical for compliance and transparency.
[38:48] compliance and transparency. The next question is how would you
[38:50] The next question is how would you design a search engine like the Google
[38:52] design a search engine like the Google search or the elastic search?
[38:55] search or the elastic search? So search engine involves web crawling,
[38:57] So search engine involves web crawling, indexing and query processing.
[39:00] indexing and query processing. The crawler traverses the web pages,
[39:02] The crawler traverses the web pages, stores content in raw form and sends it
[39:05] stores content in raw form and sends it to the indexer which tokenizes and ranks
[39:07] to the indexer which tokenizes and ranks the pages based on the keywords. The
[39:10] the pages based on the keywords. The index is stored in an inverted format
[39:12] index is stored in an inverted format mapping terms to document ids for a
[39:15] mapping terms to document ids for a quick lookup. So when user queries the
[39:18] quick lookup. So when user queries the query processor passes and retrieves the
[39:20] query processor passes and retrieves the most relevant documents using the
[39:22] most relevant documents using the ranking algorithms like TF or the BM25.
[39:26] ranking algorithms like TF or the BM25. So here the caching and premputed
[39:28] So here the caching and premputed indexes improve the response time.
[39:31] indexes improve the response time. For distributed scalability you can
[39:34] For distributed scalability you can shard the multiple indexes and replicate
[39:36] shard the multiple indexes and replicate the shards across multiple nodes. So
[39:38] the shards across multiple nodes. So here elastic search follows a similar
[39:41] here elastic search follows a similar principle
[39:43] principle using the clusters and the nodes for
[39:45] using the clusters and the nodes for fault tolerance. So ranking models can
[39:47] fault tolerance. So ranking models can use machine learning to personalize the
[39:49] use machine learning to personalize the results as well. Logs of queries are
[39:51] results as well. Logs of queries are analyzed for optimization and
[39:53] analyzed for optimization and autocomplete suggestions.
[39:56] autocomplete suggestions. So CDNs and caching help deliver results
[39:59] So CDNs and caching help deliver results faster for global users. The next
[40:01] faster for global users. The next question is how would you design an API
[40:04] question is how would you design an API rate limiting system?
[40:06] rate limiting system? So the API rate limiting controls how
[40:09] So the API rate limiting controls how many requests a user or a client can
[40:11] many requests a user or a client can make within a time frame. The token
[40:13] make within a time frame. The token bucket or the leak bucket algorithms are
[40:15] bucket or the leak bucket algorithms are commonly used for the enforcement. The
[40:18] commonly used for the enforcement. The system tracks requests count per user or
[40:20] system tracks requests count per user or IP in a fast access store like the
[40:22] IP in a fast access store like the radius. So when a request arrives the
[40:25] radius. So when a request arrives the tokens are checked if available the
[40:27] tokens are checked if available the request proceeds. If not it's throttled
[40:30] request proceeds. If not it's throttled or delayed. You can enforce limits at
[40:33] or delayed. You can enforce limits at different layers such as the gateway
[40:34] different layers such as the gateway load balancer or even at the microser
[40:37] load balancer or even at the microser level. So for distributed systems, you
[40:40] level. So for distributed systems, you need to ensure the synchronization by
[40:41] need to ensure the synchronization by using a shared radius cluster or
[40:44] using a shared radius cluster or consistent hashing.
[40:47] consistent hashing. To prevent the abuse, the system
[40:49] To prevent the abuse, the system supports a dynamic rate limits based on
[40:51] supports a dynamic rate limits based on user plans which is which can be either
[40:54] user plans which is which can be either free or premium. So logs and metrics
[40:56] free or premium. So logs and metrics help you monitor the API usage and
[40:58] help you monitor the API usage and detect the anomalies. So properly
[41:01] detect the anomalies. So properly implemented rate limiting protects
[41:03] implemented rate limiting protects against DDoS attacks, ensures fair usage
[41:06] against DDoS attacks, ensures fair usage and maintains system stability. The next
[41:09] and maintains system stability. The next question is how would you design an
[41:11] question is how would you design an e-commerce platform like Amazon at
[41:14] e-commerce platform like Amazon at scale? So an e-commerce platform
[41:16] scale? So an e-commerce platform involves multiple subsystem like you can
[41:19] involves multiple subsystem like you can have product catalog, inventory, cart,
[41:23] have product catalog, inventory, cart, orders, payments and recommendations. So
[41:25] orders, payments and recommendations. So each subsystem can be built as a
[41:27] each subsystem can be built as a microser communicating via rest APIs. So
[41:31] microser communicating via rest APIs. So product data can reside in NoSQL
[41:33] product data can reside in NoSQL databases while transactional data such
[41:35] databases while transactional data such as orders and payments goes into
[41:36] as orders and payments goes into relational databases. Search and
[41:39] relational databases. Search and filtering are powered by elastic search
[41:40] filtering are powered by elastic search and caching speeds up the product
[41:43] and caching speeds up the product retrieval. A message cue like the CFKA
[41:45] retrieval. A message cue like the CFKA ensures asynchronous workflows. For
[41:48] ensures asynchronous workflows. For example, when an order is placed, events
[41:50] example, when an order is placed, events trigger inventory updates and email
[41:52] trigger inventory updates and email notifications.
[41:53] notifications. So load balancers distribute the traffic
[41:55] So load balancers distribute the traffic across services and security and
[41:58] across services and security and authentication use OOTH or JWT. So for
[42:01] authentication use OOTH or JWT. So for scalability each service scales
[42:03] scalability each service scales independently and autoscaling policies
[42:06] independently and autoscaling policies handle spikes during the sales. So to
[42:08] handle spikes during the sales. So to ensure resilience deploy across multiple
[42:11] ensure resilience deploy across multiple regions and implement circuit breakers,
[42:13] regions and implement circuit breakers, retries and monitoring dashboards. So
[42:16] retries and monitoring dashboards. So with this architecture the platform can
[42:18] with this architecture the platform can serve millions of users reliably with
[42:20] serve millions of users reliably with high availability and low latency. So
[42:24] high availability and low latency. So guys with this we have come to the end
[42:25] guys with this we have come to the end of this session on system design
[42:27] of this session on system design interview questions. I hope you guys
[42:29] interview questions. I hope you guys have enjoyed this session. If at all you
[42:31] have enjoyed this session. If at all you have any doubts or queries related to
[42:33] have any doubts or queries related to the questions and answers that we have
[42:35] the questions and answers that we have discussed then you can write them in the
[42:36] discussed then you can write them in the comment section and we will try to
[42:38] comment section and we will try to resolve your doubts and queries as soon
[42:40] resolve your doubts and queries as soon as possible. So guys, thank you so much
[42:43] as possible. So guys, thank you so much for being with us and I wish you all the
[42:45] for being with us and I wish you all the very best for your upcoming system
[42:46] very best for your upcoming system design interview.

System Design Interview Questions 2026 | System Design Interview Questions and Answers | MindMajix

Full Transcript

Summary

Key points

Cite this page