System Design part 2: Domain Name System(DNS) and Content Delivery Network(CDN)

SAKSHI CHHABRA
8 min readJan 1, 2022

--

Domain Name System

Steps that are done for client to access medium.com

Domain Name System(DNS) is a hierarchial and decentralized system that is used to identify computers, services and resources reachable via the internet. DNS converts a domain name www.medium.com to its corresponding IP address 192.158.1.38

Let’s discuss the history first before we get to the reason on why do we need this.

History behind:

Initially Stanford Research Institute maintained a txt file named hosts.txt that would map host names to numerical ip addresses and it was located at centrally administered system. Any user that wanted to resolve host name query would download the file. Eventually when internet grew, the traffic started to grow for updating along with the size of the host file. The need for new system that was decentralized, scalable became more obvious.

Eventually, Domain Name System came into existence in 1984 in which host names reside in a database that is distributed among multiple servers decreasing the load on one server. DNS supports hierarchical names and allows registration of various data types apart from host name-to-IP address mapping. Since the DNS is distributed, its potential size is unlimited and performance won’t decrease with increase in servers.

DNS is implemented as hierarchical as well as decentralized database which contains various types of data including domain and host names. Names in DNS form a hierarchical tree like structure called as domain namespace and all of the individual labels in domain name are separated by a dot.

Structure of Domain Namespace:

Domain namespace consists of a tree like structure where each node/leaf is a label with zero or more records holding information about the domain name. A domain name consists of a label concatenated by the parent label to its right, separated by a dot.

Any domain name used in the tree is a domain. However most names are identified based on the level or way the name is commonly used. For ex: DNS domain name registered to medium (medium.com) is second level domain.

The primary reasons to assign DNS namespace are:

a) Need to distribute the load of maintaining one large DNS server among various DNS servers to improve performance as well as develop a fault-tolerant DNS server.

b) Need to delegate management of DNS domain name to a number of organizations within an organization.

c) Need to allow for host’s organizational affiliation by including host in appropriate domains (for ex. edu for educational institutions, com for commercial, org for non-profile)

More details on how DNS servers work is covered in my other blog “How Internet works”. Make sure to check it out to learn more about how DNS server work and send in request to each other.

Record caching:

Since DNS is distributed and hierarchal, your router/ISP provides information on which DNS server to contact when doing a lookup. DNS is hierarchical with few authoritative servers at the top. Lower level servers cache mapping which becomes stale after a while due to propagation delays. DNS results can also be cached by your local browser/OS for a period of time, determined by Time To Live(TTL). TTL is an upper bound on the lifespan of how long can DNS mapping be stored on the server. As soon as TTL reaches zero, the mapping is destroyed.

Results cached by local computer:

  • Name Server record: Specifies which server to contact for your domain
  • Mail Exchange record: Specifies mail server for accepting messages
  • Address: Points a name to an IP address.
  • CNAME: Points a name to another name (for ex medium.com to www.medium.com)

Since DNS servers are distributed, the requests are distributed from a single-point-of-entry to multiple servers in the background. The traffic is routed to appropriate server by load balancer (discussed in the next part) based upon the various methods:

  • Weighted round robin — In case of round robin, the incoming request is sent to specific server by cycling through a list of servers that are capable of handling request. However, it doesn’t result in a well-balanced load distribution since some servers can handle a lot more requests than other. This brings weighted round robin into picture where requests are distributed based on the servers size rather than equal distribution of requests. For more info: refer https://g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb/
  • Latency based — The request is sent to the server that provides lowest latency or has least traffic.
  • Geolocation based — The request to an appropriate server is sent based on the geolocation of the user. For ex: all the queries from Europe would be passed to ELB load balancer in Frankfurt region. One advantage of geolocation routing is that the content can be localized.

Advantages of Domain Name System:

  1. Easy to map host name to new IP address if the host’s IP address changes. Users don’t have to keep up with the change of IP address which is a burdensome task.
  2. DNS makes it easy to use the internet by remembering all the IP addresses. Domain names are easier to remember compared to IP address
  3. Offers high speed connection because of its decentralized nature.
  4. DNS makes the internet secure by preventing hackers to gain access into the servers.

Disadvantages of Domain Name System:

  1. DNS issues are hard to trouble shoot because of its distributed nature across various geographical locations.
  2. In case of a DNS attack, the original IP address could be changed to a fake one and all the users would be redirected to a fraudulent servers which could collect sensitive user information.
  3. In case of DNS breakdown, world wide web would crash as well since users won’t be able to get IP address corresponding to their desired site and hence won’t be able to access them.
  4. Accessing a DNS server brings in slight delay, which can be mitigated by caching in the client side.

Services such as Route 53 and Cloudfare provide managed DNS services.

Content Delivery Network

A Content Delivery Network(CDN) is a geographically distributed network of proxy servers. The intention is to provide content with high availability and performance to its customers by distributing proxy servers relative to the users.

Why do we need CDN:

Suppose the servers are in USA and the client is trying to access content from Japan, the content would have to travel longer distance resulting in slower download speed compared to a client accessing content from USA. To improve loading time and reduce distance, proxy servers are globally distributed which serve user requests.

Serving content from CDNs improves performance in following ways:

  • Users receive content from closer data centers
  • Servers dont have to serve requests that CDN can fulfill

How do CDNs work:

CDNs help to speed up websites by distributing static content i.e. HTML, CSS, JS across number of servers throughout the globe. The proxy servers act as cache storing static content for a specific duration i.e. Time To Live which could be upto 24 hours.

The series of steps occur when serving client requests using CDN:

  • User requests content. Initially the request is sent to the closest proxy server, if CDN server has the content, it forwards the content to client.
  • If the closest CDN server doesn’t have the desired content, it checks with nearby CDN servers. If any of the nearby proxy servers have the content, it is forwarded to the client along with the closest CDN server caching user query.
  • In case the nearby CDN servers don’t have desired content, the request is forwarded to the backend server. The server responds with the content which is forwarded to the client along with the CDN server caching user query.
  • Next time if another user requests for the same content, the CDN server that cached user query would be able to meet the request and the request won’t reach the backend servers.

Types of CDN:

(1) Pull CDN: Pull CDNs pulls content from the server when the first user requests for the content. This results in the first client request being slower. You rewrite URLs to point to the CDNs. Time-To-Live(TTL) determines the duration for which content is cached.

(2) Push CDN: Instead of waiting around to pull content when its needed, you can upload the latest content directly to the CDN beforehand. This way all of the static content will be on the servers all the time. In Push CDNs, you need to take full responsibility to upload content to the servers and rewriting URLs to point to the servers. You can set the time-to-live i.e. the time when the content expires and when its updated.

Comparison between Pull CDN and Push CDN:

  • Traffic: Sites with heavy traffic work works well with Pull CDN since traffic is spread out evenly with only recently requested content stays on CDN server. While sites with less traffic works well with Push CDN as there wont be need to push content frequently.
  • Configuration: A Pull CDN is easier to configure compared to Push CDN. Once Pull CDN is setup initially, it works flawlessly to meet client requests and updates content based on TTL.
  • Content update: Sites with higher no of frequent updates work well with Pull CDN since pushing content onto CDN each time the content is updated would add additional load onto server if the server is already struggling with heavy traffic.

Each of the types have its own merits and advantages and the choice of CDN depends heavily on the company needs.

Advantages of using CDN:

  • Less latency: Since the content has to travel less distance as CDN is able to meet client request, resulting in less download speed with reduced latency
  • Continuous Availability: Even if the server’s are down, the website would still be accessible as CDN servers are always running.
  • Reliable delivery of content: CDNs provide high quality content delivery, thus resulting in better system performance.
  • Protection against sudden traffic spike: If the website faces sudden spike in traffic, the CDN ensures that resources is available and scalable.

Disadvantages of using CDN:

  • Higher Costs: Setting up CDN servers is expensive. There are additional costs for running servers which are calculated per each gigabyte data transfer.
  • Location: If we dont have CDN server in a country with huge users, the data would have to travel a lot of distance.
  • Security issues: Since most of the companies set up CDN using third-party vendors, the server company has to share data files to another company posing security issues.
  • Unhealthy CDN server: If there is a technical issue with CDN, the server company doesn’t know how long would it take to fix the server.

The company needs helps decide if CDN should be used or not. But if a website has heavy traffic with enough resources, investing in CDN will pay off.

Thank you for checking out my blog! Make sure to follow me to cover entire system design interview prep series.

Happy learning!

--

--

SAKSHI CHHABRA

Master's student in Computer Science from University of Florida. I love to write and help others, so here am i.