System design part 1: General Terms used in System design

SAKSHI CHHABRA
5 min readJun 14, 2021

This is going to be a blog series for system design topics for new grads. Weekly, i am going to cover new topics that need to be studied for system design.

But before we dive into system design topics, we need to understand why do we need system design.

https://www.educba.com/what-is-system-design/

You have built your website that is not scalable and it can only handle 100s of clients well. Your software is working well so far and you are happy. Suppose, you website gets overnight fame and millions of user try to access your website. Your system will explode with millions of requests. This is why you need to design system from the start keeping scalability in mind. When needed, you will be able to scale your system well with increased performance.

Let’s get some general terms out of the way before going deep into topics.

  1. Scalability: A system can be called scalable if adding more resources in the system results in performance increase s.t. performance is directly proportional to resources added.

For ex: Resources are added to system to facilitate redundancy which is an important defense against failures. Service is said to be scalable if adding more resources results in increased performance of the system.

Scalability cannot be second thought after the system is built. It requires applications to be built with scalability in mind. Many algorithms would work well with small datasets but the system tends to explode with large datasets or when requests increase.

Heterogeneity makes scalability hard as well. Heterogeneity means that some nodes in an algorithm will process resources faster compared to others or store more data. Heterogeneity comes in existence when we need to replace current resources with more powerful cost-effective resources. Algorithms in this case would either break down or wouldn’t use new equipments to its fullest, if system is not decentralized.

Good scalability can be achieved only if we design and build algorithms keeping scalability in mind. We need to take into account redundancy, heterogeneity, axis along with system will grow and build a system that can handle these conditions.

2. Performance: Increased performance means being able to do more units of work or handle large units of work when datasets grow.

If your system has performance problem, then it would be slow for even 1 user compared to scalability where system is slow for many users but fast for 1 user.

3. Throughput: Number of actions executed or results produced per unit time. The goal of the system should be to achieve maximum throughput.

4. Latency: Time required to produce result or perform some action.

For ex : If a factory produces 1 copy in 2 hours and 100 copies in 1 day.

Then, throughput = 100copies/day; latency = 2 hours/copy

5. Consistency: Every read from around the world will be the most recent write.

6. Availability: Every request receives a response. This doesn’t guarantee that the response would be most recent.

7. Partition Tolerance: The system continues to operate despite network failures. Network failure leads to partitioning where some servers wouldn’t be consistent/available compared to rest of the servers.

7. CAP Theorem:

CAP Theorem states that we get to choose two out of Consistency, Availability and Partition Tolerance. It looks like we have many options but we don’t really have. This is because network failures happen. They go down frequently and unexpectedly and we don’t get to choose when they occur. Since we have to keep the system running despite network failure(partition tolerance), we have to choose between availability and consistency.

  • Availability + Partition tolerance: Read returns the readily available data from the partitioned node, which could be stale. The node is still accepting writes but will process them after the system is resolved. Availability is chosen when business allows for some flexibility around the data until system synchronizes .
  • Consistency + Partition tolerance: Waits for the most recent response from the partitioned node which could result in timeout error. The system could also return an error depending on the situation. Consistency is chosen when bussiness requires atomic read/writes.

Ex. of when availability is preferred: Number of likes on a blog, comments count on a blog.

Ex. of when consistency is preferred: Content in a saved story/blog.

The decision between availability and consistency is a tradeoff. It is in the hands of the software developer what to choose based on business requirements in case of network partition.

It is essential to understand the trade-offs and make right decision in case of network failure for the success of the software. Failing to get this right could doom your application to failure before the first deployment.

7. Cache: The database has terabytes of data stored in it. Querying from such database would result in high latency. The solution is implementation of in-memory cache. Memcached and Redis are excellent example of in-memory cache.

A cache is a key-value store that resides between application and data storage. Whenever application/server needs to read data, it first tries to retrieve data from cache, if not found then database is queried. This is done because cache is very fast as it holds dataset in RAM which lets requests be answered as fast as possible. To give you an example of how fast cache is, a cache can do hundreds of thousands of requests per sec.

Redis is persistent while memcache scales well.

8. Asynchronism: Asynchronism is freeing up the processing to work on other tasks while the results are being rendered in the background. This is done by two ways:

  • Instead of making a user wait for every request, we can pre-render massive framework dynamic content into static HTML files and store to Amazon S3 or content delivery network. This would make website super fast(handle millions of request per sec). A script would do this job and would be called by cronjob every hour. This was one way of doing asynchronism.
  • If a user requests for a computation-intensive task, the front end of the website sends the job to the job queue and signals back to the user and lets the user browse the website meanwhile. As soon as the frontend is signaled about “job is done”, the frontend notifies the user about it.

Thank you for reading. Please feel free to like, comment and share if you found it interesting.

--

--

SAKSHI CHHABRA

Master's student in Computer Science from University of Florida. I love to write and help others, so here am i.