A few years ago there was a big hype about “being online”. Everyone had to be online and there wasn’t one day where you didn’t have to be afraid that you missed the coolest and newest technology. This was just the TCP/IP layer and it was already well known for over 20 years back then.
So what is this TCP/IP thing? What good does it do for you and has this been the latest thing or will it even get better? This article is trying to touch some of the aspects of the TCP/IP layer, higher level protocols based on it and give a brief outlook on what is next.
1. TCP what?
TCP/IP is an abbreviation for Transmission Control Protocol/Internet Protocol. It only has one task: it makes sure that all of your data is sent across the internet and is being delivered to the right computer.
TCP/IP is a bundle of communication rules. Without it, data could not be transported in a reliable manner and hardware independent from one computer to another. As such, it is the basis of the internet as we know it today.
However, it is important to understand that TCP/IP are just the parents in a family of networking protocols. There are other families of networking protocols and there are (if we keep the image of the family in mind) children and relatives belonging to the TCP/IP family.
2. Layers
So, are you one of the fellows who first went online with Dial Up Networking on a Microsoft system? Have you ever clicked with your right mouse button on DUN to see further info and see how many packages you have transmitted over your dialup connection? Those packages were prepared and send out by mom and dad of the TCP/IP family. Surely, you probably never cared about this, unless you had problems with your dialup connection. But we are going to have a closer look at the microcosm of this newly discovered family.
Unfortunately, the TCP/IP family is somewhat limited and they sort of don’t get the bigger picture. This is a good thing though. They don’t pack up the 5MB MP3 song you are trading over Kazaa et al. at once. They slice your data into several hierarchical structured little packages. It doesn’t matter whether you send an email, use a filesharing tool, browser the web, use an instant messeger or even look at the pics you’re not supposed to look at – every chunk of data is sliced into pieces and transmitted over this standardized protocol.
Without this standard, every developer would need to create his own (proprietary) method to transmit data. However, developers (just like mathematicians) don’t like to do a lot of extra work and they like to reuse functionality. They usually don’t care how the data gets from your computer and the modem over a phone line or network cable through a whole line or network of computers to its final destination.
However, we do and that’s the reason of this article.
Think of the TCP/IP family as a set of different layers. The top layer is the actual application protocol which communicates with the application of your choice (your filesharing tool, mail software, browser, instant messenger, etc).
The top layer communicates with the TCP layer, which prepares and forwards the data to the IP layer. The IP layer is directly responsible for communicating with the networking card and its drivers. This low level layer is often referred to as the data-link layer. But when your local tech guru talks to you about your data link layer he usually just means your ethernet card, modem, or wlan card. The card is responsible for feeding the data into the network cable and retrieving incoming packages from it. Once it has retried a data package, it forwards it to the next higher layer (IP).
Depending on the amount of data you can send and retrieve (also known as upstream or downstream ratio) we might be talking about hundreds of thousands of packages for a normal session.
Imagine the chaos and confusion if the data could not be put back together properly!
Network drivers
That’s exactly what would happen without the network driver. The network driver adds data to each data package in a so called header. The header contains information such as the length of the data package.
The IP layer
The IP layer is responsible for making sure that all data is routed properly. It’s a point to point protocol and doesn’t have to worry about the network cards or other hardware related issues involved. Basically all it needs to know is the IP address of the final destination.
IP addresses and DNS
So what is an IP address? An IP address is a something like 66.98.189.217 – it is the numeric address of a host in a network. Instead of going to ‘www.ezoshosting.com’ you could as well go to 66.98.189.217 and you’d see the same site. Give it a try, type it in your browser’s address bar and if the ezoshosting IP address has not changed, since I type out this article on my keyboard, you will see the same thing as when accessing ‘www.ezoshosting.com’
That’s neat, right? Let’s have a quick look at how this trick works: If you enter ‘www.ezoshosting.com’ in your browser, TCP/IP tries to resolve that address. At first, it will look at your local hosts file. The local hosts file is a plain text file that allows you to overwrite public domain names with your own assigned IP address. This is only effective if someone on your computer types in the name of the domain. While this is relatively simple and effective, this is also possibly quite evil and dangerous. There is an increasing trend by worm and virus authors to overwrite addresses of well known companies with maliscuous IP addresses, so, it’s a good practice to regularly check your hosts file (in Windows XP you can locate it in /Windows\System32\Drivers\Etc) and see if no one added a local entry for websites you regularly login to and enter high sensitive data.
If TCP does not find an entry in your hosts file, it will query your name server (which is usually the nameserver of your ISP). Your ISPs nameserver usually caches the values of an assigned IP address for 24 to 72 hours. So, if you go to ‘www.ezoshosting.com’ and ‘www.ezoshosting.com’ moves one hour later to a new IP address, you will still see the old IP address for as long as your ISP has the old address cached. If your local ISPs name servers don’t know the address of a domain name, they will query the next higher nameservers (and this goes on and on until they reach the level of the so called root-servers). If in doubt, the root servers can happily point to a set of nameservers that really is supposed to know which IP address you have to go to in order to retrieve data from a domain. The root servers gather this data from the
registrars.
Package headers
Let’s have a close look at the header again. We already know that it has the length added by the network driver. But what other goodies does the family add to the package as it sends it out across the net? The IP layer adds the IP address of the final destination host and the IP address of the sending host. As with every species, there are slackers among the group of data packages. For one reason or another they just never arrive at the destination. That’s the reason why the destination needs the address of the sender so that it can answer and say “hey you, yes you, send me the package XYZ again” if a package XYZ was lost along the way.
Now let’s get back to the family business. After the IP layer received a block of data, it passes it on to the next higher level: the TCP layer.
TCP layer
While the IP layer does the basic work, the job of the TCP layer could be compared to the job of a supervisor. The data blocks I referred to earlier in the article are also known as segments. The TCP layer’s job is to make sure that segments arrive in the right order and without modification at the proper destination. So what do they need to do to achieve that? Right, they’re adding data to the header. The TCP layer adds a checksum and a number which contains the order the data packages need to be put back in later on. Imagine what would happen if packages wouldn’t be sorted. Did you ever listen to a song backwards? You would probably experience something similar if packages would not be numbered.
UDP layer
However, there are situations where TCP headers adds too much overhead to a connection, for example, if you are listening to a live music or video stream. In those situations it doesn’t really matter that you receive all packages all the time and in the right order. In those situations, you’re better off by using a more lightweight protocol such as the User Datagram Protocol (UDP).
UDP controls the data transfer at a very minimal level only. It does not guarantee that all segments arrive at the destination or that all of them arrive in the proper order.
TCP and medieval cities
Once TCP has verified that a data segment is correct, it removes the header and passes the data on to the actual application.
The actual applications are sitting behind a port and waiting on the input as delivered by TCP. Each application has at least one unique port. The route from the port to the actual application is called the application protocol. The concepts of ports and firewalls (which we can’t cover here) might be easy to visualize if you think of a medieval city. Back in the medieval ages, they had walls around the cities and gates to let people in and out. In some cities they had special gates for merchants, soldiers, etc. The walls are the firewall of our computer, the gates are the ports, and the different purposes of each gate can be compared to the different jobs an application needs to do.
A computer, just like the medieval city, can have different ports open at the same time – which guarantees that you can chat with friends while you are using a filesharing software.
There are several popular protocols: the Hypertext Transfer Protocol (HTTP), SMTP (Simple Mail Transfer Protocol), and FTP (File Transfer Protocol) are probably the most popular ones. The HTTP protocol standardizes how data needs to be formed so that your browser (no matter which one you use) is able to recognize the data it receives.
If you’re curious to see what the real text data your browser receives looks like and you know how to open a telnet session, we should have a closer look:
As I mentioned above, the different applications are sitting behind different unique ports. For a webserver behind a HTTP connection this is usually port 80, for a mailserver behind a SMTP connection this is usually port 25, for a FTP server behind a FTP connection this is usually port 21.
A sample HTTP connection
To connect to the webserver residing at ‘www.ezoshosting.com’ through telnet, simply enter this command:
telnet www.ezoshosting.com 80
The screen output will be something like this:
Trying 66.98.189.217...
Connected to www.ezoshosting.com.
Escape character is '^]'.
Now type the escape character and an actual webpage will be shown in its sourcecode to your telnet session.
A sample SMTP connection
To connect to the mailserver residing at mail.ezoshosting.com through telnet, simply enter this command:
telnet mail.ezoshosting.com 25
The screen output will be something like this:
Trying 66.98.189.217...
Connected to mail.ezoshosting.com.
Escape character is '^]'.
220-houston.ezoshosting.com ESMTP Exim 4.24 #1 Fri, 30 Jan 2004 09:09:24 -0600
220-We do not authorize the use of this system to transport unsolicited,
220 and/or bulk e-mail.
It will just sit there and do nothing. If you feel like it, you can communicate some more with the server:
HELO tester.atsomedomain.com
250-houston.ezoshosting.com Hello tester.atsomedomain.com[ipaddresshere]
250-SIZE 52428800
250-PIPELINING
250-AUTH PLAIN LOGIN
250-STARTTLS
250 HELP
As you see, it recognized you and greeted you back.
If you want to talk more with the chatty server, just type:
HELP
The output will be:
214-Commands supported:
214 AUTH STARTTLS HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP
A sample FTP connection
To connect to the ftp server residing at ftp.ezoshosting.com through telnet, simply enter this command:
telnet www.ezoshosting.com 21
The screen output will be something like this:
Trying 66.98.189.217...
Connected to ftp.ezoshosting.com.
Escape character is '^]'.
220 ProFTPD 1.2.9 Server (ftp.ezoshosting.com) [66.98.189.217]
To see a list of available commands / words:
HELP
214-The following commands are recognized (* =>'s unimplemented).
USER PASS ACCT* CWD XCWD CDUP XCUP SMNT*
QUIT REIN* PORT PASV EPRT EPSV TYPE STRU
MODE RETR STOR STOU APPE ALLO* REST RNFR
RNTO ABOR DELE MDTM RMD XRMD MKD XMKD
PWD XPWD SIZE LIST NLST SITE SYST STAT
HELP NOOP FEAT OPTS ADAT* AUTH* CCC* CONF*
ENC* MIC* PBSZ* PROT*
214 Direct comments to [email protected].
The next generation
Unfortunately, the current and still relatively easy way of handling IP addresses is doomed to die, as the number of IP addresses is limited to only about 4 billion addresses. This may sound like a lot, but it really isn’t. Have a look at how many pages are spidered by Google. As I write this article, google spidered 3,307,998,701 pages and that’s still only a small percentage of all estimated webpages. Experts believe that in 2005 we will run out of Ipv4 addresses. This is not a new development though and people are prepared.
In the early 90s work on IPv6 has started. IPv6 will greatly enhance the number of available IP addresses and many people believe it will lead to each electronic device having it’s own IP address and we still won’t run out of IP addresses.
Leave a Reply