Billy's big list of web page suffixes


This page is a list of the web-page suffixes I've seen out there on the web, in order of frequency. Most of these different suffixes represent different server-side scripting languages. For the un-initiated, a server-side scripting language is a programming language that can be used to generate HTML pages on a web server. Here's sorta how it works:

When you load a static HTML page from a web site, things go like follows: 1) Your web browser sends the address for the page to the site's web server program. 2) The web server reads the address line and figures out which of its files the address line is asking for. 3) The web server sends the contents of that file back to your web browser. 4) Your web browser loads the contents of the file, interprets it as HTML, and displays it to you as a web page.

With server-side scripting, things go a little differently. When the web server sees that the page being asked for has a suffix that means it's a server-side script, then it doesn't just send you the contents of the file. Instead, it first sends the file off to an interpreter for whatever scripting language it is, and then it sends you the output from the interpreter. So you don't see the source code from the server-side script; all you see is its output. And the only clue you have that a server-side scripting language is being used at all, is the suffix of the page you're asking for.

There are, of course, even more suffixes out there than the ones listed here. Most web servers can be configured to serve pages with any kind of extension at all, interpreting them as any possible scripting language. For instance, by changing a line in the configuration file for my copy of Apache, I could make it interpret ".fox" files as PHP files, ".barf" files as CGIs, etc. By far, the most common of this type of change is to make it interpret all .html files as server-side scripts so that the tell-tale suffix doesn't give away the presence of server-side scripting.

I've noticed that server-side scripting languages tend to fall into three main categories.
  1. Tag-based: These are scripting languages where a programmer can creates scripts that define new HTML-like tags. Then other developers can put these new tags into HTML documents to achieve things that you can't achieve by HTML alone. SSI and ColdFusion fall into this category.
  2. Embedded-script: These scripting languages allow the user to embed bits of code into a file which is otherwise plain HTML. The server removes the bits of code, interprets them, and prints their output into the HTML file in the codes' place. PHP and ASP fall into this category.
  3. Programs that print HTML: These are scripts that operate as normal programs, but generate HTML as their output. The web server sends the full output of the program, and nothing else, as the web page. CGI's fall into this category.
If you know of some web page suffix that I'm missing, please send me an e-mail. My address is billy at the domain name for this webpage. If you can't figure out how to resolve that description into an e-mail address, odds are you're a spam bot. If you're a human, I look forward to hearing from you. :)

Update:
I just discovered a web site that catalogs file extensions: www.filext.com. This has been of some help in researching the more obscure file extensions.
  1. .html (also .htm), Hypertext Markup Language

    The standard for web-pages, this one's still the most common. HTML is the basic markup language that web browsers interpret. Most of the other suffixes on this page represent little programs on the server-side whose output is HTML code. A .html file, on the other hand, is just a static file containing HTML code.

    If the HTML files on a site end in ".htm" instead of ".html", it's probably running on a Microsoft operating system. This is a remnant from the days of DOS and early Windows machines, which couldn't handle file suffixes of more than three characters. The latest Windows OSes can handle four-character file suffixes, but retain the three-character default, I guess for the sake of tradition.

    HTML is augmented by CSS, Cascading Style Sheets. CSS is a system that lets you format text in ways that HTML doesn't; it allows for greater detail than HTML, and lets you create and re-use styles, plus it implements some features that just aren't part of HTML, like the now-standard links that light up when you put your mouse over them, or background images that don't scroll. Most CSS-enabled websites store their style information in a separate .css file, but you never actually view those directly in a web browser, so they're not counted in this list.

  2. .php, PHP Hypertext Preprocessor (www.php.net)

    Yes, the acronym "PHP" stands for "PHP Hypertext Preprocessor". It's what the open-source folks like to call a recursive acronym.

    PHP is currently the most common of the server-side scripting languages. It's an open-source project designed specifically for quick development of dynamic web pages, and can run either as an Apache module, or as a CGI on less friendly web servers. The PHP scripting language is loosely typed and has a gigantic built-in function set, and a syntax somewhere between Perl and C++. PHP 4 (the current version) may not be the best platform for huge code bases because of its lack of exception-handling routines and object-oriented functionality, but it can't be beat for spitting out lightweight projects in 5 minutes. Plus, just about every web host everywhere supports PHP, which means you can switch whenever you find a cheaper plan.

    .php pages are interpreted as static HTML pages until the web server hits a <? tag. Then, it begins interpreting the page as a PHP program until it hits a ?> tag. This lets you easily embed small snippets of PHP wherever you need them in a static page, or embed large snippets of static HTML wherever you need them in a PHP script, both of which come in handy.

    Because PHP lacks exception-handling routines, and because commercial web-hosts won't let customers change their PHP configuration file, you can see a lot of PHP error messages on PHP pages, automatically formatted to show up in bold text. This is especially true of database-driven sites, which will spit out these messages whenever the database goes down, if they don't have their configuration file set to log them instead of displaying them.

  3. .asp (also .aspx), Active Server Pages (www.microsoft.com)

    ASP is Microsoft's proprietary answer to PHP. Technically, ASP is not a scripting language; it's an open standard object model for web services, or something along those lines. So, although the vast majority of ASP pages are written in Microsoft's VisualBasic, they can actually be written in any language for which an ASP thingy has been written. Presently, this means you could also write ASP pages in JavaScript or PerlScript, and maybe one or two others I haven't heard of. The defining characteristic that makes it ASP is the way that you access the 'Write' method of the 'Server' object in order to output "Hello World!".

    But hell, if you're going to be tying yourself to Microsoft you might as well just write your ASP in VisualBasic and get the full benefits of vertical integration.

    ASP code is inserted into static HTML pages with <% %> tags by default, although these tags can be changed to anything you would like. So, like, if you were a PHP programmer forced by his workplace to use ASP, you could change those percentage signs to question marks.

    Microsoft has added a lot of extensions to ASP that are very difficult to find information on in a google search because they are the same as top-level domains, like COM and .NET. As far as I understand it, COM added the ability to write some code for your ASP pages in a separate file, compile it to assembly, and register it as a service with the operating system. You just have to make your code in the form of an object, which you instantiate from within your scripts to use it. Thus, you can write code for the web in C++ or whatever, and get really fast speeds. The .Net thing, on the other hand, was just a whole bunch of new features added to ASP.

    You'll tend to see ASP pages on the types of sites that get suckered in to full Microsoft vertical integration. Which is to say, banks, business folks, and other people who just don't understand computers. If, for some godforsaken reason, you have to write a webapp for a Microsoft SQL database, or if the only web server you can use is IIS, you wouldn't do too bad to use ASP. You'll just have a hell of a time finding a decent web host.

  4. .jsp, Java Server Pages (java.sun.com)

    If Java is the only programming language you know, JSP is the way to go. JSP also lets you embed code into static HTML pages, and additionally provides support for a COM-like setup, called servlets. JSP lets you have a lot more control over what's going on than PHP, but it's a lot harder to set up the server, a lot harder to write JSP code, and nearly impossible to find a JSP host who will let you run your own servlets.

    You'll see .jsp pages being served by Java-heavy sites, by sites that use Oracle as their database, and at miscellaneous other tech-heavy websites.

    www.xerox.com seems to be a JSP-based site at present.

  5. .cgi, Common Gateway Interface

    You don't see .cgi as a suffix to web pages very often these days, but a few sites built around it are still out there. CGI is the standard that defines the form elements you see on web pages (checkboxes, text fields, menus, buttons, etc.) and how they send their information back to the web server when the form is submitted. Back before specialized server-side scripting languages came about, people wrote stand-alone programs that could receive user information by the CGI standard, and create appropriate output in response. They called them "CGI scripts", and they were stored in .cgi files.

    CGI scripts can be written in any language that the web server computer can run, though usually they're written in C++ or Perl. A CGI script bears almost no resemblance to the static HTML page it generates; it's just a program run by the web server, with its output sent as the return page. CGI gives you the greatest flexibility in server-side scripting languages, but potentially the most difficult time creating your script.

    Because CGI scripts are a pain to write, usually a web site will only contain a few .cgi pages; the standard model was that most of the site would be static, but where you had your "send us feedback" form or your credit card submission form, there'd be a .cgi. A few sites, however, developed some model that makes every one of their pages a .cgi; these are fairly rare. Either they've got one big script that is capable of generating every single page in the site (and that's actually possible with just about any server-side scripting language), or they've got a CGI-based templating system. Since the invention of CSS, you don't see that much anymore.

    www.chillingeffects.org seems to be a .cgi-based site.

  6. .shtml SSI-enabled HTML

    SSI (Server-Side Includes) is a lightweight server-side scripting system. The main thing it does it let you insert special tags into your HTML pages, which the web server substitutes with some output. Normally, it's used to insert a file that has part of a page's structure, into a large number of pages. For instance, if you had a set of buttons that appeared at the bottom of every page on your website, you could save those buttons into a file called buttons.html, and then use SSI to insert the contents of buttons.html into the bottom of every other page. Then if you needed to add a new button, you would change buttons.html once, and the change would be instantly reflected in every other page on your site.

    You can get this same effect with just about any other server-side scripting language, but SSI is still handy for some people who write primarily static sites.

    Anyway, some web servers require you to end your files with ".shtml" if they will include SSI. At least, by default. You can easily change them to parse all .html files for SSI, at a small performance cost.

    SlashDot contains a lot of .shtml pages.

  7. .cfm (also .cfml), ColdFusion Markup Language (www.macromedia.com)

    A server-side scripting setup provided by Macromedia, the people who brought you DreamWeaver. Because of the DreamWeaver association, and because CF is one of the easiest scripting languages for new users to learn, .cfm files are mostly used by organizations with a lot of graphic designers in charge. CF sites tend to be very good-looking, but annoyingly heavy on graphics and Flash (another macromedia product).

    www.statelocalgov.net is a .cfm site, although less-graphics heavy than most.

  8. .pl, Perl script (www.cpan.org)

    The .pl suffix is rare on the web, because Perl scripts are usually served as .cgi. Perl is popular enough that there are a few out there, though. Although Perl won't let you insert code into static HTML pages, it does have a library specifically for generating web pages. Using the Perl CGI library, you can generate all the parts of each web page using a specialized function; a command to generate the <head> tag, a command to generate the <body> tag, etc. So, you don't have to use "print" to generate every single line of code, which must be handy.

    Unless you were a Perl specialist or had a need for really good Regular Expression service, it would probably be easier to just use PHP, which has a Perl-like syntax.

  9. .adp, AOLserver Dynamic Pages (www.aolserver.com

    ADP pages are the server-side scripting language used by AOL's open-source web-server, AOLserver. It's another format that lets you embed source code into static HTML pages, using Tcl as its scripting language. ADP pages are very rare outside of websites run by AOL. But since AOLserver is open source, anybody can run it, and you'll see these around every once in a while.

  10. .bml, Better Markup Language (BML pages at LiveJournal)

    BML is a tag-based server-side scripting language, kind of like ColdFusion. It lets you create your own customized tags, to make it easier to keep a consistent look throughout a site without the use of CSS. BML blocks are essentially little macros.

    LiveJournal uses a lot of .bml pages for the static portions of their web site.

  11. .xml, Extensible Markup Language (www.w3.org)

    XML is what you might call a meta-language. It's a format for developing markup languages - mainly, that you put things in <pointy brackets> more or less like HTML.

    However, you will hear people talk about writing websites "in XML". Usually what they're referring to is the use of XSL (XML Style Sheets) to take data that's been organized with XML and transform it into XHTML (XML-compliant HTML). XSL is part of the big standards organizations to XML-ize everything on the web. As mentioned above, there are a number of effects you can only achieve in a web page by use of Cascading Style Sheets (or possibly client-side scripting). CSS, however, is very much not XML-compliant; it's full of {braces}, colons, and semi-colons. XSL will eventually allow people to generate all the effects of CSS, in a way that is XML-compliant.

    Right now, it's not yet very common. It's hard to imagine it picking up in the future, because it's a lot less concise than CSS. XSL is very well-suited for displaying long lists of data, organized and formatted, but it's not great for generating non-database sites.

    Usually, XSL-based pages will be parsed on the server-side and send out only XHTML to the user. But some web browsers can also properly render an .xml page which contains some data, with its first line linking it to an .xsl page that contains the style information.

  12. .do, Apache Struts (struts.apache.org)

    The .do file extension is part of the Apache Struts system, framework for building Java-based dynamic web sites. I'm not sure exactly what the struts do; their web-site says that it's an implementation of the Model-View-Controller design paradigm.

    www.geico.com is an example of a site that uses .do pages. You mostly see them at large sites, especially e-commerce sites. This is because the struts system is designed to help organize and maintain large websites that need to evolve over time.

  13. .mspx, MicroSoft Page XML (www.microsoft.com)

    .mspx is a file extension you'll only see while browsing Microsoft's internal sites. It's a custom-made XML-based system they use to organize and implement their enormous website.

  14. .jhtml, Java HTML (www.atg.com)

    JHTML was a proprietary scripting system that was part of ATG's Dynamo web server. It was essentially a Java-based system for adding new, dynamic tags to HTML. Around 2002 it looks like ATG began shifting their product towards the open standard of JSP rather than JHTML, and .jhtml pages are now deprecated and a little rare. You'll mostly see them sites similar to the ones that run .jsp pages, except that .jhtml-driven sites will be more lightweight. Obviously, if they were more technically adept, they would have upgraded to .jsp by now, or have started with it to begin with.

    Reuters and The Telegraph both use .jhtml, which suggests that there may be some big newspaper CMS system that uses it.

  15. .phtml, PHP HTML (www.php.net) or Perl HTML (www.ossp.org)
    .phtml is a fairly rare file extension which can represent a few different systems. It's an alternate, mostly deprecated file extension for PHP, which seems to have been used by the Wikipedia software at some point in the past. As a result, you'll see a lot of sites running Wikipedia's back-end, which have .phtml files, such as www.disinfopedia.org.

    It's also the file extension for OSSP eperl when it's used as a server-side scripting language. OSSP eperl is much like PHP and ASP, except that it's a system to embed perl into documents. It's designed to work in any kind of file that needs to be programmatically generated, but can also work as a web server.
    Finally, this was also the file extension for a system called iPerl, or inverse Perl, which seems to have done the same thing as OSSP eperl. iPerl now seems to be defunct.

  16. .xtp, XML Template Pages (www.caucho.com)

    XTP is an offshoot of the Resin servlet engine.
  17. .mvc, Miva Script (www.miva.com)

    This is a file extension I stumbled across in Google Directories while trying to find out information about the .do file extension. I have yet to see this one actually out there, except on the search results from the company's own website. This is a proprietary scripting language, which executes on the Miva Virtual Machine. Miva Script is another embedded server-side scripting language.


On a final note, there are some web sites out there that don't use any suffix at all (such as Wikipedia.). This is done by tweaking the web server, or running a specialized program to generate all your web pages. You normally only see this approach used on sites that are large enough, or strange enough, to require using a custom web-serving program. I didn't count this in the list, though, because it is not a suffix; it is the lack of a suffix.