Following on the theme of aspects of http this third instalment covers compression.
A browser can declare it supports a compression method by declaring so in its Accept-Encoding request header for example a common header appearing in http requests would be â€œAccept-Encoding: gzip deflateâ€. A server declares the content being served as compressed via a response header Content-Encoding â€“ a quick example â€œContent-Encoding: gzipâ€.
There are two compression methods. Obviously Gzip and deflate as just mentioned in the examples. Interestingly the actual compression algorithm provided by both gzip and deflate formats is zlib. The reason zlib finds itself in so many open standards (for example PPP compression) is spelled out on the home page of the reference implementation â€“ â€œA Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free Not to Mention Unencumbered by Patents)â€. Obviously the free aspect is what counts there (although spiffyness is often a desirable property in compression algorithms). So what is the difference between gzip deflate and zlib? Gzip and deflate are both container formats that provide a header and footer (mostly additional file information and error detection data) while zlib is the actual compressed stream format that sits inside. You can impress your girl friends with that geek trivia!
In fact there are many products and services that have incorporated the zlib implementation available at http://zlib.net. This reference implementation has a subtle hidden feature that allows pumping out a deflate formatted or gzip formatted zlib stream. There is an argument in the â€œconstructorâ€ (the library is plain C so I use that term loosely) to the zlib stream object called window bits. If you pass a negative value you get a deflate formatted result. Add 16 to the windows bits argument to get a gzip formatted result. Leave windows bits unaffected to get a raw zlib stream. Yes yes your girlfriends are excited I know.
For a moment let’s consider how the Apache web server handles compression. As with most web servers Apache implementation is provided as an extension in this particular case mod_gzip. As zlib’s source code license is BSD’ish in nature it is not surprising that the zlib library provides most of the grunt work for mod_gzip.
In fact perusing the source code looking for the magic negative or +16 window size we find on line 2888 of mod_gzip.c (version 1.5 was HEAD at the time this was written) a constant windows size that is negative. Ok so mod_gzip is banking on zlib to provide a deflate formatted zlib stream. However based on the browser’s Accept-Encoding request header mod_gzip needs to figure out if it is going to do a deflate formatting or a gzip formatting. It turns out mod_gzip is doing its own gzip header and trailer which seems kind of silly to me. Maybe one of you reading out there wants to submit a patch â€“ you could possibly trim a few dozen lines of code out of mod_gzip just by switching the window bits argument and throwing out the code that does the manual gzip formatting. If you do it trust me your girl friend is going to go wild â€“ you practically wrote Apache!
Ironically it also seems to skip handling deflate compression as well. I can guess why that is likely. When implementing http compression the standard clearly states a deflate or gzip formatted stream. However early implementers of both browsers and servers were confused about the difference between deflate formatting and zlib streams so out on the Internet nobody is quite sure what you mean when you say Accept-Encoding: deflate in a browser request.
So that wraps up the server side now for the browsers. I am going to pick on Internet Explorer 6 here. It is just too easy! Internet Explorer has the concept of in-process plug-ins that can be used as content handlers. This is a little bit different to ActiveX controls â€“ think instead OLE Documents (like embedding spreadsheets in word documents). This technique is used for example by Adobe’s Acrobat reader to allow PDF content to be interacted with directly within a browser window without opening an Acrobat window. However as Internet Explorer makes a web request it assumes it knows best and marks the request as being able to handle compression. The browser starts receiving the compressed content and then notices the MIME type of the response. It uses the MIME type to recognise that Acrobat should get involved launches it up and starts handing over the data. Whoops! Acrobat starts receiving compressed content and Acrobat is assuming uncompressed content. Hilarity ensues.
I believe this has been fixed now as the lower levels of the http client library in windows now does transparent decompression. I don’t know if that fixes the Acrobat problem I haven’t tried in IE7.
Http compression is a nice thing to do for your users. Remember from faking performance that there are only two connections to use simultaneously. If we are compressing content those channels are freed up earlier to process remaining content. However apply compression judiciously these days I generally only compress html and plain text (both static and dynamically generated) as well as XML and JSON (for use in AJAX communications) as these are the most tested cases of the use of compression.
In his real job Luke Amery works on shopping cart software. He is the technical director of On Technology Australia’s leading e-commerce development company.