This post is adapted from an email I wrote several years ago and have since reused many times to help explain the “moving pieces” on the web to non-programmers. Someone on Hacker News asked about this kind of thing, so I thought I’d finally post it, with some images reused from my old talk at Wharton about the same thing. This article won’t make you a programmer, but hopefully it can give a big-picture overview of how the major parts fit together. It’s like Web 101. In some places it oversimplifies (lies a little as I like to put it), but in ways that I think are helpful for a beginner.
HTML hit the world in 1994. The way it works is your web browser, based on a URL, asks a web server somewhere out on the Internet for an HTML file, and the web server sends it back. So there are two actors: the web browser and the web server.
The web server is just a program running on someone else’s computer. The most popular web server is called Apache, but there are many others. A web server listens for people requesting URLs, and it sends back HTML files, images, etc. There are lots of companies out there that will host your website for you. Basically they provide a web server, and you upload whatever files you want the world to see.
The web browser is Firefox or Safari or IE or whatever. You type a URL into the address bar, and it finds the web server out on the Internet and asks it for the file in the URL. The server sends it some HTML, and the browser is responsible for rendering the HTML file. It may also request supporting files like images. Different web browsers render HTML slightly differently, making many headaches for web designers.
HTML stands for HyperText Markup Language. The HyperText means it has links you can click to visit new pages, and the Markup means you use angle-bracket tags to “mark up” certain parts of the text. In markup, there is an opening tag and (usually) a closing tag. The closing tag repeats the name of the opening tag, but with a forward slash at the beginning. This markup can indicate structure or display. For instance,
<i>ibid</i> will render “ibid” in italics; this is an example of using markup for display. On the other hand, you might have this:
<ul> <li>Walk the dog</li> <li>Do the laundry</li> <li>Pay the bills</li> </ul>
This produces a bulleted list with three items (ul = unnumbered list, li = list item):
A tag may also have attributes, which look like
name="value". This is how you write links, which are “anchor tags”:
<a href="http://www.google.com/">Click here to visit Google</a>.
HTML may reference other files. When this happens, the browser downloads those files too and adds them to the display. The main example is images, which work like this:
<img src="my-portrait.png">. (Notice there is no closing tag for images.) And here are some other cases of HTML referencing outside files:
CSS stands for Cascading StyleSheets. It’s an attempt to separate out display-style information from the HTML code, so the HTML just contains structural markup. A CSS file is a separate file, referenced by the HTML file, which tells how each type of element should be rendered (e.g. in big font, with a 2-pixel red border, or whatever). It’s considered good form to use CSS for display and HTML for structure, at least insofar as that’s possible. This is partly because it reduces typing and makes changes easier, it makes reading the HTML easier, and you can have different people working on different things. CSS also supports more advanced display options that just HTML, so some effects you can only achieve with CSS.
Again, each browser interprets CSS in its own slightly incompatible way. People who write really slick-looking web pages spend a ton of time discovering and applying tricks to get an acceptable display on each browser type. One way is to write separate HTML+CSS files for each browser type, detect what a user is using, and send the appropriate files. But no one really does this. At most people write a special CSS file for IE (which is the most incompatible with the others), and then use special codes to serve it only to IE browsers. There are other less drastic techniques as well.
Because this gets so expensive, it’s common to decide up front which browsers you’ll support, and which will get a slightly wonky-looking site. Your choice will depend on your expected audience. Nowadays some sites will just refuse to support older browsers or even not-so-old versions of IE. I like to aim for IE 8+ or even 7+. If you’re somebody like Amazon, then you add browsers with less market share like IE 5+, Safari, Opera, Konquerer, and maybe more. (Amazon is perhaps a bad example because most of their site uses a fairly loose, flexible design, so slight inconsistencies aren’t noticeable. A better example would be something with an artsy look like a winery.) Nowadays a big new challenge is supporting mobile devices like Androids and iPhones.
Or you can have a separate .js file referenced by the HTML page.
Let’s go back to the beginning: the way the web works is your web browser asks a web server for an HTML file, and the web server sends the file across the Internet. But what if the web page is something like a shopping cart, where the HTML is different for each user? In this case, the web server doesn’t just have a static HTML file, but it runs a program to generate an HTML file on the spot, and it sends the result to your web browser. So here the web server is basically a middle-man: it delegates the task of generating the HTML to a separate program, and it passes the result along to the web browser. The old term for this is CGI, and while it isn’t always correct any more to call this sort of thing a CGI, it’s still useful as a catch-all term for any server-side HTML generation. You can build a CGI in whatever language you like. Java and C# are common choices, as is PHP. Most modern techniques, PHP included, involve writing a file that is mostly HTML, with some special code embedded inside to output the variable parts. Generally people choose PHP for smaller sites and Java/C#/Python/Ruby for bigger sites, but that distinction seems less valid each year.
Using CGI also imposes a larger burden on your web server than just serving static HTML files. Maybe the CGI has to access a database to find a user’s favorite genre of books, for example. If you have a lot of traffic, it’s important to consider the efficiency of your CGI code. The ability to support lots of traffic is called scalability. This isn’t something you really ought not worry about too much right away, but someone experienced will make fewer mistakes here.
On the web, “performance” has two components: latency and scalability. Latency is how snappy your site is: how quickly you can serve a page and get all the images etc. loaded. Big players like Google and Facebook have done studies showing that engagement and conversions drop off if you have page load times over 600ms or so, hence the saying “speed is a feature.”
Scalability, on the other hand, refers to how well your site degrades as you add more users. So instead of snappiness, it’s about capacity. You can also think of scalability as how hard/expensive it is to add more capacity.
People talk about “vertical” vs. “horizontal” scalability. Vertical scalability means buying faster hardware. Horizontal scalability means buying more cheap hardware and distributing the load. Vertical scalability is expensive and has hard limits, but horizontal scalability requires careful up-front design. Rails and Django are nice frameworks because they enable or even in some ways force a more scalable architecture. One thing that is hard (by “hard” I mean it’s a current research topic for Ph.D.s) is horizontal scaling for databases. There are worthwhile techniques, but they are complex to implement. Often it’s better to just buy a bigger box.
Latency and scalability don’t always come together. You can be snappy but crash when a media hit causes traffic to spike, or you could have a sluggish site that keeps chugging no matter how many visitors arrive. Sometimes optimizations will help both categories, but not always.
These are all server-side technologies like PHP. But you have to distinguish between the programming language and the web application framework. Like PHP, Java, Ruby, and Python are all programming languages. You can do anything with them. Each language also has a leading framework that helps you build web sites, so that you aren’t starting from scratch. These frameworks are J2EE, Ruby on Rails, and Django. So Ruby and Rails are two different things. Rails is open source code written in Ruby that implements things needed by every web site. You then add your custom Ruby code that uses Rails to do its job.
J2EE, Rails, and Django all go beyond CGI in that they are “application servers”, so they do more work for you, like tracking sessions and pooling database connections. (Actually PHP can do that, too.) They encourage a pattern of web development called Model-View-Controller (MVC), which helps separate your code into different concerns. The Model defines the structure of your data and your “business logic.” It is the back end of your site. Its job is to talk to the database and provide a convenient, intelligible API for the other layers. The Controller is what handles incoming web requests. Each request goes to the Controller code first. The Controller decides what to do. It asks things like “Is this person logged in?” and “What page do I show for this URL?” The View is what renders the actual HTML. Usually you write your view in some kind of templating mini-language, like JSP or ERB or HAML. All those languages let you write a bunch of HTML and then embed Java/Ruby/etc code into it to inject things like user names, product images, or whatever. In general, the Controller sets up whatever data the View will need by loading Model objects, and then the View reads those Model objects to fill in its blanks.
To new coders, MVC probably seems like a lot of extraneous infrastructure. It does increase the learning curve, but it’s basically the standard in professional web development. It’s the pattern adopted by J2EE, Rails, and Django as the right way to build websites. By separating your code into layers and giving each layer its own responsibility, it becomes much easier to manage complexity. Otherwise it’s all but impossible to reason correctly about what your code will do. Without MVC, it’s easy to make a small change and inadvertently break something that you thought was unrelated. MVC protects you from this kind of thing. It also makes it easier to separate tasks among multiple developers.blog comments powered by Disqus Prev: Dates and Time Zones in Rails Next: Why not distribute public keys via SMTP?