How Internet Works
Motivation:
If you plan on interviewing for Software Developer role in companies, you must be familiar with the fundaments of how internet works. This blog includes topics like: browsers function at higher level, from DNS lookup to TCP/IP to socket programming, working of world wide web which includes stuff like what happens when we type in URL, where does the request go next, how does it reach servers and how is the website displayed.
The internet is a worldwide network of networks where data and other media can be transmitted across interconnected devices. Each device connected to internet is assigned IP address during dial-up session. To communicate with other computers, we need to transfer data into electronic signals and back into data. This is done using protocol stack which is built into computer’s OS. The protocol stack used on internet is TCP/IP protocol stack.
IP(Internet Protocol: Rules on how information is sent from one computer to another computer over internet); TCP(Transport Control Protocol: Ensures data transfer is reliable from application to application);
In application layer, message is broken down into manageable chunks(packets). In TCP, packets are assigned port number which refers to the location message would go in destination computer. Packets are assigned destination IP address in the IP layer. In the data link layer, data is converted to electronic signals and sent over the internet. The packets start from application layer and move all the way down to data link layer, transfer via ethernet, reach destination link layer, move all the way up to application layer with the message. In this way, message is sent from souce to dstination.
Internet Packet Routing:
Routers are connected between networks to route packets between them efficiently. All the routers know about their sub-networks and the IP addresses they use but they don’t know the IP addresses above in the hierarchy. The larger NSP’s are at the top of the hierarchy followed by NAP, followed by several sub-networks which is further followed by several more sub-networks. At the bottom, there are many many LAN with computer connected to them and we need to send messages between two of these computers.
Does router need to know all IP addresses?
Internet Protocol specifies how the routers will forward the packets based on the destination IP Address. Router don’t need to know where every IP address is. It just needs to know to which of it’s neighbors does it need to route the packet to. In this way, instead of keeping track of billion IP addresses, routers keep track of less than a million. For ex, for IP address x.x.x.x, starting x.x of the address is the network prefix(college, business) and all the packets with same network prefix will be routed to that location by routers.
How does router route the packet?
When a packet arrives at the router, the router checks if packet’s IP addresses exists in it’s routing table. If it does, the packet is sent to corresponding network. If it doesn’t, it is sent onto the default route i.e. to the next router in hierarchy. This process is repeated until the packet reaches NSP. NSP contains the largest routing table and it contains information, from here on the packet will be sent to its appropriate network ‘down’ the hierarchy.
When a packet moves from one router to another, its called a hop.
TCP protocol v/s IP protocol:
But the Internet Protocol is unreliable and that’s why TCP is paired with IP. TCP is reliable as opposed to IP which is unreliable. The packets may arrive out of order at the destination if a later packet finds a quicker path to destination address compared to earlier packet. To handle this, every packet has information about packet’s relative order compared to the entire message and TCP uses this information to combine the packets in original order.
The IP doesn’t guarantee that all packets are received and there are chances of packet loss. However, TCP sends acknowledgement back to the sender and asks to resend lost packets. TCP even uses checksum to check if the packet contents have been manipulated.
Domain Name Server:
We don’t know the IP address of the server from whom we want webpage, all we know is the URL(Uniform Resource Allocator)/ domain name of the web server. This is where DNS server comes in handy. DNS(Domain Name Server) is a distributed system database which keeps upto date with domain name(URL) with the corresponding IP address. None of the DNS servers contains all the domain names. If a DNS server doesn’t have the domain name, the request is sent to the DNS server ‘up’ in the hierarchy. For ex: if the local DNS server doesn't’t know about IP address, the request is sent onto ISP’s DNS server. If the ISP’s DNS server doesn't’t have corresponding IP address, the request is sent to root name servers which store top level domains like .com, .edu, org etc. The root name servers have the IP address corresponding to every domain name. The IP address is sent back to the computer and it’s stored in every DNS server it went through before.
Which DNS server would browser sent request?
During installation of a computer, every browser receives address to the primary and secondary DNS server. Whenever the client enters URL in the browser, the browser first connects to primary DNS server. On obtaining the corresponding IP address, the browser then connects to the target web server and requests for the webpage.
Next steps after typing in URL in web browser:
Below are the steps of what happens when you type in URL in the web browser:
- If the IP address corresponding to the browser isn’t stored in the local DNS cache, the request for IP address is sent to the DNS server. The DNS server returns the corresponding IP address.
- The browser connects with web server and sends HTTP request (using protocol stack) for the webpage. The webserver receives the requests and checks for the requested page. If server cannot find the page, 404 error message is sent else the page is sent.
- The browser receives webpage in form of packets which are then re-assembled into a complete webpage and connection is closed. When the browser needs additional resources from the server, another HTTP connection is established and the request is closed as soon as request is fulfilled.
The application layer makes use of other layers beneath to handle the complexities of moving packets across Internet.
HTTP protocol v/s Web Socket protocol:
HTTP is unidirectional where the client sends the request and the server sends the response. As soon as the client gets the response, the connection is closed. HTTP is stateless protocol that runs on top of TCP which is a connection-oriented protocol. When the client sends the HTTP request to the server, a TCP connection is opened between client & server and as soon as the client gets the response, the TCP connection is terminated. Simple RESTful applications use HTTP protocol which are stateless.
WebSocket is bidirectional that is used in the same scenario of client-server communication but it starts ws:// or wss:// . After client-server handshake, if client-server decide for the new connection to be alive, this new connection is WebSocket. It a stateful protocol which means that connection between client and server will be alive until either party terminates it. After terminating the connection by one party, the connection is closed from both ends. Since web socket is bidirectional, message transfer takes place both ways.
When should web socket be used:
- Gaming application: Data is continuously sent by the server and without refreshing the screen, the effects will show on screen. Since the data needs to be sent continously without creating new connection, so using web socket protocol would speed up the application.
- Real time application: Web Socket would be used if the data is continuously being sent from the backend server to the client end. Since, the data is continuously being transmitted in the same connection , websocket would be faster in this case.
- Chat: Websocket connection is established once for exchanging texts among the subscriber.
Websocket should only be used when we want real time data updates or continuous transmission of data over network. If we want to get data just once and application needs to process the received data, HTTP protocol would be suitable for this case.
Thanks for reading. Feel free to leave any comments/suggestions/ comments and ‘clap’ if you liked the blog.