How the Internet Works

Work in progress! You've been warned...

This is meant to be a somewhat comprehensive introduction to how the internet works, spcifically, how websites work (there's lots of other uses of the internet that aren't covered). Some parts are quite technical, but I've tried to explain things from the ground up; it's intended for people who want to start tinkering with web development but don't have any experience in that area yet. For any more experienced readers, forgive me if some parts are basic.

URLs, clients, and servers, oh my!

In some ways, a website is a lot like a waiter: The waiter is a separate entity that knows stuff you don't. In order to get what you want from the waiter, you need to ask him. In order to successfully ask him, you should understand each other, and for that you need to speak the same language.

Now imagine the waiter is in a locked room (some restaurant owners can be cruel), and you want to know the day's menu. You need his phone number in order to talk to him. You know his name, so you can look him up in the phone book and get his phone number. You can then call him up and ask what the day's menu is (fortunately, you both speak the same language). The daily special turns out to be banana spaghetti. You decide to eat at home.

OK, so what does a trapped waiter have to do with the internet? Well, think of a website as a waiter. A waiter is a person (that is, an abstract concept) hosted in a human body; the waiter's body is what does all the actual work for the waiter (breathing, talking, thinking, etc.). Similarly, a website is an abstract concept hosted on a computer called a server, which is what actually does all the work. Just like you and the waiter, there are fundamentally two entities when using a website: You, and the other guy. The other guy is the server, and you (actually, your computer) are called the client.

Now imagine Bob (he's actually been here all along, quiet and patient). Bob likes the internet. There's pictures of cats.

Bob wants to know if there were any unicorn sightings last night, so he types in the URL "http://wheredidalltheunicornsgo.com/sightings" into Internet Explorer. But before we go further, it's Terminology Time!™ (Yay!)

First, what is a URL? As an abstract concept, it's a pointer to a certain location on the internet. More concretely, it's composed of several distinct parts. Let's break down the URL Bob typed in:

"http://": This is the protocol. We'll come back to the protocol later.
"wheredidalltheunicornsgo.com": This is the domain name. Just like every business has a street address, every website has a domain name. Unlike street addresses, though, the domain name is guaranteed to be unique.
"/sightings": This is the path of the URL. It indicates what resource of the website with the given domain name is being referenced, just as a specific room name might be indicated in addition to a street address.

So, the program that Bob typed the URL into, called the browser, first looks up the domain name (using a system called DNS) to find an IP address. This is akin to looking up a waiter's phone number in the phone book when all you know is his name. An IP address is just a special number used to identify a particular server that you can use to connect (talk) to that server, much like a phone number. Since it would be annoying to go around memorizing numbers all day, we use the domain names instead, and the browser looks them up for us (there's also another benefit to using domain names -- it lets us have several websites at the same server with the same IP address; they're differentiated by the domain name. Think two companies sharing office space in the same building: Any mail sent there goes to the right company because they have different names, even though they're at the same address).

Talk to me, baby

So Bob told the browser (Internet Explorer, in his case) what URL he wanted. The browser looked up the domain name to find out the IP address of the server that hosts the website corresponding to the domain name. (Still with me?) Now what? Well, now that we've got the server's address, we want to ask it for the resource specified by the URL's path ("/sightings"). But, both the client (the browser program) and server need to speak the same language otherwise you won't understand each other and communication will be impossible. That's what the protocol specifies: What language to use.

In Bob's case, he's specified HTTP as the protocol. This is the most common protocol for the web, which is why it's used as the default if you type in a URL without a protocol. All web browsers know how to "speak" HTTP, and all web servers know how to "speak" it too. HTTP is built around verbs and resources; resources are referred to by URLs, and verbs are specified as part of the language itself. Web browsers commonly use only two verbs (GET and POST, with GET being far more common), but there are several others (e.g. PUT and DELETE). A single verb-URL action is called a request, and every request solicits a response. Requests and responses both have (fully specified) headers (metadata about the request or response), and optionally bodies (which can contain arbitrary data, like images, text, PDFs, etc.).

So the browser can use HTTP to get the unicorn sightings from the server, but first the client needs to connect to the server. Think of a connection as an open phone line; the client "dials up" the server (using its IP address) and the server "answers" (accepts the incoming connection). Remember, you need an open phone line before you can even attempt talking to the waiter, who may or may not understand the language you're using. The browser uses another, very low-level, protocol called TCP (which virtually all networked computers understand) to establish the underlying connection.

OK, the browser is connected to the server, and it knows what language to speak to request the latest unicorn sightings (HTTP). So that's what it does next! It issues an HTTP GET request to the server for the given URL over the open connection. The request has an empty body, since we're requesting data, not sending any (GET requests always have empty bodies). The request's headers are set appropriately to indicate that the body is empty. The browser then waits for the server's HTTP response. Once the response comes back over the connection (which can take a while if the response is very large or the network is slow) the browser looks at the headers of the response to find out what type of data was sent back in the response body (specifically, the "Content-Type" header). Assuming it understands what kind of data was sent back, it knows how to display it to the user.

Tools exist that let you see all the HTTP requests and corresponding responses that are going on behind the scenes. A very good free one is Fiddler.

What browsers know

Browsers speak HTTP. But that's just the protocol, a common language for whizzing data back and forth. What makes a browser show an image here, with a sentence in bold there, and a link to Wikipedia there? It turns out to be a combination of things!

At the base of it all is a simple human-readable markup language called HTML. It's just plain text which has been annotated with special "tags" to give it structure. Browsers understand HTML and are able to use it to render (display) a web page. There's a tag that specifies the webpage's title. There's tags to create tables of data, another to denote a paragraph. There's lots of tags, but don't worry -- HTML is very simple. Here's an example:

<html>
    <head>
        <title>Cats own the world</title>
    </head>
    <body>
        <p>
            This is a paragraph!
            With a <a href="http://example.com/">link in it!</a>
        </p>
    </body>
</html>

You could copy-paste this into a file with the extension ".htm" and open it in a browser. That's a webpage! Each tag is opened (e.g. <body>), then closed after the inner content of the tag is specified (</body> -- note the slash). For example, <p> opens a paragraph tag, "This is a..." is the content of that tag, then the tag is closed with </p>. The <a> tag creates a link; the content of the tag is the link text ("link in it!"), and the href attribute of the tag speficies a URL that the link points to. When the user clicks on the link, the browser makes a GET request for the URL of the link, and replaces the current page with whatever's in the response.

Right, so browsers make requests using HTTP, and responses come back, and if the response contains HTML, the browser can render it as a webpage. What else can the browser understand? Images, for one (in a variety of formats). So that's how you see a picture of a cat when you navigate to the URL of a cat picture -- the data is sent back with the HTTP Content-Type header set appropriately, and the browser says, "Aha! I know what to do with images!" and displays it on-screen. What about images embedded in the middle of a page? Well, there's an HTML tag for that:

<img src="http://example.com/cheeseburger.jpeg" />

The tag causes the browser to make an HTTP request for the image at the URL specified in the "src" attribute as soon as the tag is encountered (when the HTML is being parsed). When the image data comes back, the image is inserted into the page at the location of the tag. This tag is special in that it never has any inner content (and thus, cannot be "opened" then "closed" like normal tags). It is said to be "self-closed" by the slash that precedes the >.

Browsers also understand a few other types of data, as will be seen shortly.

Semantics

Does HTML let you specify where an image should go? What font to use? What colour to make links? Technically, it does, but it's considered bad practice to do so because it mixes the presentation of your content with the content itself; this is a bad thing because when it comes time to change the presentation you find it inextricably linked with the content. If you separate the two, the content becomes semantically meaningful by itself, and the presentation is independent and can be changed without affecting the meaning content in any way. You might even have many different presentations of the same webpage (HTML content), for example the screen version, print version, and mobile phone version.

So how is the presentation specified? With another simple language just for that, called CSS. CSS lets you specify style rules (e.g. make the font bold) that target certain HTML elements. For example, here's how you would make all links bold in CSS:

a {
    font-weight: bold;
}

The a selects all anchor tags, and the rules listed between the braces (just one, in this case) are applied to the selected elements.

Behaviour

Browsers interpret HTML defining the page structure and content, and CSS which specifies how that HTML should look. What about dynamic stuff, stuff that changes while you're on a page? Things like popping up a message box when somebody clicks on a specific element on the page? Well, all the behaviour of each page is defined with JavaScript (no relation to Java, despite the unfortunate name). JavaScript is a programming language that most browsers understand. It lets you change things on the page without needing to make an HTTP request; that's why it's called client-side scripting.

Tying it all together

Browsers are special programs that interpret HTML (content and structure), CSS (style to apply to the HTML) and JavaScript (dynamic behaviour). They also understand more boring formats like plain text and images. They get resources (specified via URLs) using HTTP (over a TCP connection).

Links are the way of getting from one place to the other on the web. Pages link to each other and to other websites; the network of links creates the "web" as we know it, and it is what allows search engines (like Google) to "spider" their way along the web and index every linked site so they can be returned in search results. URLs are also used in HTML to specify the images, CSS, and JavaScript that the page needs in order to display properly.

When you go to a webpage, the browser makes an HTTP GET request for the page to the server; the response typically contains only HTML. The HTML contains all the content and structure of the page, plus links to any CSS and JavaScript files that are needed, and URLs for the data of each image. Before the browser displays anything, it gets any required CSS and JavaScript using more HTTP requests. Then it starts showing the page, and the images pop in as their corresponding requests complete.

There's a whole lot more complexity and detail in the real world than in the simplistic model I've presented here, but it's hopefully a good place to start if you're a budding web developer.