Martin Abrahams Team : Web Development Tags : Technology Web Development Performance

Planning for the worst. Websites can fail; accept it.

Martin Abrahams Team : Web Development Tags : Technology Web Development Performance

It's 8:55AM on Monday morning (okay 9:05AM) and I sit down at my desk with my coffee and breakfast, fire up my email client and follow the standard developer tradition of checking for some quick lolz before starting the day. To my amusement, this is what I was faced with!

While most people would be frustrated, as a web developer I can tell you in a strange way I actually feel a sense relief when I see this. The reason for this is that it's reassurance that these people are human too. It's also a rare chance to get a glimpse of how different organisations choose to handle the situation. The reality of the internet is that things can and do go wrong, ranging from the hilarious to the catastrophic.

With this in mind, it's still fairly rare to see the big players fall over but this of course is no coincidence.

The part which most website operators are well aware of is ensuring that their hosting infrastructure is sufficient and reliable enough to handle the current amount of traffic and a reasonable amount of growth.

With this in place, we still need to accept that things won't always go to plan, things can and will fail. Of course there's a never ending list of generic IT failures/issues which can take place, but here's a list of less obvious reasons which I've personally come across.


You've gone viral OMG

Every marketing person's dream! However the "reasonable amount of growth" which was factored into your hosting environment never ever considered getting 6 years worth of traffic in 2 hours. Being able to buffer that much into your hosting infrastructure for an unforeseen traffic spike is financially not possible. Cloud hosting does make alot of sense in this case, but again this can be overkill if your not forecasting a massive spike in traffic.


Natural disaster

One time I had a whole bunch of website's hosted with a very credible hosting provider, in the "one of the best data centre's in Australia etc etc" with a 99.9% uptime agreement. To my surprise, all our sites had gone down for a number of hours. The data centre later informed me that there was a burst water main and the place literally flooded out and everything had to be switched off. What did they do about the 99.9% agreement? Who cares!? The sites went down.


3rd party failures

Websites are becoming increasingly connected. When a website is heavily reliant on a 3rd party API, what happens when that system goes offline?

So when these unfortunate scenarios occur, how you choose to handle the situation could greatly affect how your customer's react to the outage. The first step is to identify what can actually go wrong and where to draw the line.

At the very least have a proper error page to handle major site level errors, to capture the well known 500 error codes, this is where the execution of the code has failed or one reason or another.

This won't handle a fully blown outage where your website has literally been plucked from the internet though. I'm not going to go into the specifics of how to implement these technically, the point here is the actual content itself.

A page with your logo saying "error", is going to do little to save face. What you choose to do here really depends on the type of website, there's no set rule for what should be on an error page.

This is where your getting back to the basics of UX and actually cater for this scenario. It may be appropriate to provide a fallback phone number or email address, direct the user's to your facebook or twitter page where they can see updates on the situation, or if all else fails go down the route of a cute animal like imgur has done or probably the most imfamous error page of all time - twitter's "fail whale".