How internet works
Understand how internet works with real world examples
10 min read
In this article, we will answer the fundamental question of computer networking with some real-world examples. If you are someone who wants to understand the basic concepts of computer networking and how the internet works, then this article will help you.
I will be answering a very basic question -
If you open your browser and go to google.com or any other website, what happens in the background?
In a very basic sense
Browser (client) sends a request for this website over the internet.
The server (the service or machine which hosts this website) over the internet responds with the data.
This is called the request-response paradigm and this is how computers communicate with each other over a network. The above communication follows the client-server model where the browser (client) requests some resources and the server responds. The communication happens in form of chunks of data, also known as data packets (more on this below).
This data is transferred between the client and server using a series of steps and methods which follow certain protocols (rules).
Let’s go down one level deeper and try to understand the core concepts and technologies involved here.
👉 IP Address
The Internet is made up of billions of computers interconnected to each other via some network. Every resource has some identifier by which it can be uniquely identified over the internet. By resource, we mean anything which can be accessed over the internet, like websites, web applications, etc. This unique identifier is called an IP Address. IP addresses are numeric addresses in xx.xx.xx.xx format, for example, 127.0.0.1.
In simple terms, DNS (Domain Name System) is the system of assigning some human-readable names to the not-so-human-readable IP addresses of the resources to access these resources(websites, etc) over the internet.
Since all internet resources are uniquely identified by IP Address, it is very difficult for a human to remember such IP addresses for the things they need. For example:
google.com IP Address: 22.214.171.124
facebook.com IP Address: 126.96.36.199
Too many numbers to remember!!!
That’s why we assign domain names to our web services and the DNS takes care of mapping it to the actual IP address of the host.
For example: When I type google.com in the browser, the DNS will internally check for the google.com IP Address and then forward the request to that IP Address over the internet.
👉 Data packets
The communication between the client and server happens in the form of data packets. These packets contain the actual data to be sent along with the sender and receiver details like source and destination IP Addresses and Port numbers.
IP addresses are used to know which host or computer has requested the resource (for example your router) and the request is being sent to which destination host (google’s server host), whereas ports are used to identify the application or the software service within that host/device which requested the resource (browser).
👉 A typical home setup for accessing the internet
In a typical home setup, we have one router and all our devices like mobile, laptops, etc. are connected to this router for accessing the internet. We will use this real-world example to explain how it all works.
We all are exposed to the internet via some ISP (Internet Service Provider), for example, Vodafone, Airtel, or some local ISPs. These ISPs provide us with services by which we can access the internet.
If I connect my laptop to my home router and go to google.com from my laptop’s browser, the request does not directly reach Google’s server. The request first goes from the laptop to the home router. Then it goes from the home router to the ISP and then over the internet.
Local devices(phone/laptop, etc) -> Router -> ISP/s -> Internet
The ISPs also provide public IP Addresses to our local network routers, by which we are publicly identified over the internet. For the outside world/internet, all devices connected to this router will seem to have the same IP address, which is the same as the router.
But wait, if the ISP is assigning only one IP Address to my Home Router, then how multiple devices connected to my router are being uniquely identified by my router?
The answer to this question is DHCP.
Whenever any devices are connected to the router for accessing the internet, the router assigns new unique IP addresses to the devices connected to it. These IP address assignments are done using DHCP (Dynamic Host Configuration Protocol). These IP addresses are called private IP addresses, as the outside internet does not know about these IP Addresses. Internet only knows the IP address of the ISP, ISP in turn knows the IP Address of the router. So basically to trace back the request, the public internet will know about the ISP from which the request came, and the ISP will know from which specific router was the request made.
Now, when the response comes back from google.com, since the destination IP address was of the router, how does my home router know whether the response should be sent back to my laptop or mobile or computer?
And once it finds out the device( let’s say laptop), how does it know which service within that device should it be sending the response (for example browser, terminal, some gaming application, etc)?
The answer is NAT and Port numbers
👉 NAT and Ports
NAT (Network Address Translation) is the process of translating the public IP address of the router to the private IP address (which was assigned by the router to its connected devices using DHCP) and vice versa whenever the packet goes from public to private network or from private or public network.
Port numbers are used to identify the service or the application within the host (for example browser in your laptop) which made the request.
When you hit google.com from your laptop’s browser, the outgoing packet has the source IP address value as your laptop's private IP address (the one which was assigned by the router using DHCP) and the port number as the service(browser) identifier.
This request packet reaches your home router, here NAT updates the source IP address as the IP address of the router. It also masks* the source port number and then finally adds the corresponding entry in the NAT Table. When the response is received back to the home router from the internet, the NAT again checks this table to determine the exact host/device and the port/service to which the reply should be sent.
*NAT does the masking of the port number to avoid the problem which can arise if two different hosts/devices connected to the same router request a resource over the same port. As the NAT table will have a public IP address, a private IP Address and a port number entry, once the response comes back from the internet to the router, it will be unsure which host should the reply be sent to.
The clients who are requesting some resource over the internet can also specify some ephemeral port numbers instead of the default port of service which initiated the request. These ports are short-lived port numbers that clients can specify and expect a response on these ports.
This is useful if you have multiple processes running for the same service. The browser tab is the best example of this. Each new tab you open in your web browser will assign a new ephemeral port for that particular channel and this is how the router will know which specific browser tab had initiated the google.com request.
👉 Communication from ISP and above
The Internet is nothing but a large number of computers or hosts connected. So once the ISP forwards the request over the internet, the request might hop through a series of hosts before it finally reaches the server which is responsible for serving the request (in our case it is the server of google.com).
Routers have routing tables and forwarding tables which help in the data packet routing. The routing table is the data table which stores all the possible paths required to reach the destination host or router, and the forwarding table helps in actually forwarding the data packet to the next hop.
For example: If I made a request to google.com from Mumbai, India, the request might be served by some server in the USA. The request might travel from India to UAE, UAE to Austria, and Austria to the USA till it reaches the destination host which can serve the request.
Once the request reaches the server, the server will respond to the request and sends back the required data using the same technique as discussed above. The response will reach the source router (which made the request) via the same public internet and then NAT and ports will take up from there to send the response to the right host and service.
Usually, ISPs are classified into three tiers.
Tier 3 or Tier 2 ISPs are the local or national ISPs which serve some specific local areas or metropolitan regions within a country. Examples of some Tier 2 ISPs are Vodafone, Airtel, etc.
Tier 1 ISPs are the international ISPs which are responsible for connecting other lower-tier ISPs to the global internet. An example of a Tier 1 ISP is Tata Communications, etc.
These ISPs are also the ones who own the submarine cables under the oceans via which the actual internet data is transferred between nations and continents and how the whole internet works globally.
The whole world is connected not wirelessly, but via the actual submarine cables under the seas and oceans. Check this out — submarinecablemap.com. Isn’t it interesting? 🤩
👉 Other commonly used protocols over the web
There are a lot of protocols which are used in computer networking, but we will be covering the protocols which are most commonly used over the web. They are TCP, UDP and HTTP/s. Below is a basic and short explanation of these protocols. This explanation should be sufficient for starters, although you can read about them in detail later over the internet.
This is used by web browsers. This protocol defines the format of the data being transferred through the web. It mentions how the communication should take place between client and server, like how the client will request the data, what will be the format of the data, and how the server should respond to the client. This is a connectionless and stateless protocol. Connectionless means that the connection is maintained only till the client receives the response from the server, and not forever. Stateless means that the state (data) involved in a particular request and response is only available during that request and response time only.
This is a connection-oriented protocol which means that the connection needs to be established between both parties (for example client and server) before actually transmitting the data. It is a reliable protocol which ensures data delivery and no data loss once the connection is created. To establish connection, TCP uses a 3-way handshake(using SYN, SYN-ACK, and ACK signals) to establish a reliable connection.
It is a connection-less protocol which means that the connection is not needed for the client and server to communicate. This also means that the data sent across between the client and server is not guaranteed for delivery as the acknowledgement flags are not present here. The upside of this protocol is that it is faster (low latency) than TCP. This protocol is used across applications where we need low latency and it is fine if the data packets are dropped. The most common applications are real-time services like gaming, video calls, etc.
Apart from these technologies, there are various other technologies as well which are widely used across the internet, like CDN, etc. which we will cover in some other article.
If you found this article helpful, please like, comment and share this article so that it reaches others. 😃
For more such content, please subscribe to my newsletter so that you get an email notification on my latest post. 💻
Let's connect 💫. You can follow me on