Jason Deacon Team : Web Development Tags : Web Development

The frozen river of output caching

Jason Deacon Team : Web Development Tags : Web Development

I've blogged before about caching and performance, but I've never really touched upon output caching.

I liken output caching to a frozen river. Observed from the surface it's not moving, the water is frozen, static, unmoving. But under the surface is a constant flow of water which the observers never quite see properly. This is very much like output caching; browsers see a static, unmoving page while data continues to flow and change underneath it.

For those that don't know, output caching is where the webserver (in our case, IIS 7) executes a request, gathers the output of that page and caches it so that when future requests are made it can return the cached output instead of executing a full request again, thus saving valuable CPU and memory resources.

On the surface this seems like a very good idea, and there are specific scenarios when it is. But rarely is this approach suitable for internet applications where there is user generated data or data-driven mechanisms being presented. The reason should be obvious, the data will continue to change while a fixed version of the page is returned to users because it's cached. 

Even worse, depending on the configuration of the output cache, a view of a page by a certain user which contains customised data (eg "Hi Jason!" on a header if logged in) may be cached and returned to other users of the site, so Rob may in fact see "Hi Jason!" and thus start to question the integrity of the site he's just registered on.

Another problem which presents in the same way are data dependant features, especially those that rely on temporal mechanisms to control their visibility. For example, if a visitor to a page at 5:59pm is able to view a special deal on a product which expires at 6:00pm, and the page is cached, then a visitor at 6:20pm may be able to still view the deal and attempt to purchase it which would (hopefully) result in a failure during the purchase process.

Of course the sane thing to do would be to put in place mechanisms to invalidate the cached data based on various triggers (time / data) and also to cache by user or sub-condition as well as just by url. These are valid approaches, but approaches which only serve to make a bad solution just a little bit better.

Of course so far I've only talked about one major problem with output caching, stale data. The other major problem is the cost of rendering the page when it is not cached.

Typically output caching is put in place to address slow-loading pages with the understanding that it'll be fine for most people but awfully slow for maybe 1 in 10 visitors to the site while the page is executed and re-cached. This leads to unpredictable performance characteristics of the server hardware that the site is hosted on. 

Unpredictable application performance is probably one of the worst problems that a site can have. Unpredictable performance means there can be no planning for scalability based on increasing traffic and only serves to dilute any effort directed toward performance optimisation ("it gets slow at 5pm, sometimes, on thursdays, but only if you're wearing a green shirt" vs "Page X always takes two seconds to load").

The solution? Just write performant code. Sites which are written to be fast simply don't need output caching, are fully dynamic and change as required as the underlying data changes and avoids all the pitfalls of output caching.

There are some (very few) scenarios where output caching is a valid choice, but for the most part it's a band-aid solution.

Cache is a crutch