What kills a website; literally

I had a personal blog that I ran for a few years; it was shut down a while ago though I kept the blogs and am slowly migrating across to my company blog.

The blog below is from 12 August 2009 and it reads really bizarrely to me five years later.

I'll give a little context to the situation at the time (and for why I wrote the blog) and then explain why my blog now strikes me as foreign and naive.

In 2008, we (Wiliam) were commissioned to build a substantial eCommerce website for a substantial 'bricks-and-mortar- retailer. We had previously worked with the client before on some successful projects and websites.

The project was one that taught us plenty of hard lessons and the experience was tough - it was probably the largest single build project we had undertaken at the time and whilst we applied some great people and tools to the project, it was our process that let us down, especially around expectation management and client management.

We ultimately launched the website and transactions started to come through. Things were looking up.

The client then began their PR offensive including a big piece on Channel Nine's A Current Affair. (I have previously written about the affect of appearing on a show such as A Current Affair, something we saw repeatedly after we developed the group buying website Cudo).

Traffic trebbled immediately. And the website fell down, incapable of handling the load.

The client was understandably miffed (much stronger words do actually apply, though let's stick with miffed for purposes of the PG rated blog).

We argued (slightly weakly) that we still had work to do on the website, including database optimisation and load testing and had not been given the opportunity to do so - we were still completing functions of the website post-launch.

The website was down or incredibly slow for at least two days: a suprising affect of being on a show such as A Current Affair is that the traffic doesn't spike and then abate; it lingers for days as users re-attempt to access it.

It ultimately turned out that our approach to the architecture of the website and database had left plenty of room for improvement; we engaged some expert optimisation folk who quickly helped us atone for our sins and pretty quickly, the website was flying and able to handle plenty of traffic without slowing.

Unfortunately, not quickly enough to save our relationship with the client which by now had turned bitter. I had a long-term friendship with the CEO of the client also suffer.

The positive from the whole thing was that it taught me and my team some incredibly valuable lessons, especially on the technology front. In fact, I often cite the experience as one of the most memorable in my journey as a web designer; most certainly, the experience marked the point at which Wiliam could claim to have shifted from building smaller, lower-traffic websites to substantial and enterprise websites, often with very considerable experience.

I do not doubt we would have had to have learnt the experience one way or another.

And as I write this introduction to my old blog - critical of our conduct in so many respects (and sorry for ultimately, underserving our client) - if there is a slap on the back for our team, it is that even companies such as Myer and David Jones were making the same mistakes as we were, five years later and with much bigger budgets.

Maybe we were just ahead of our time?!

I am pleased to say that the website we built continues to operate and reasonably successfully, albeit that our client sold it to another retailer to run as a standalone website; our client realised that they were not or did not wish to become an online retailer, something that no doubt contributed to their difficulty with the project and general concept of eCommerce, and a decision that respectfully, any reasonable retailer looking back to 2009 from today would regard as short-sighted.

I guess the reasons I feel so odd and awkward about re-reading this blog so many years later are two-fold, bearing in mind that in relation to the project I have outlined above, only one of the issues referred to in my old blog was a culprit:

  1. The errors we (and others) made are laughably simple. That is time and experience talking of course, though only amateurs would fall into the same traps in 2014: an encouraging reflection of how far we have come.
  2. Some of the errors simply don't exist anymore; for example, modern technologies manage memory and 'recycling' without the need for perfect coding and someone watching the server's lights blink. Similarly, the 'cloud' has dealt with the more common hardware issues found in dedicated servers. In other words, do not rely on this blog to be a contemporary guideline.

Of course, it could also be that because my company has evolved and matured and sees a very different sort of client today, we just don't see the problems and errors anymore. Though I'd like to think for the sanity of clients and web developers worldwide, we have all moved on a bit.

Enjoy the blog as much as I am embarrassed to reprint it! And thanks to Martin Wicyniak for the graphic - it still kills it (get it?)!

 

What kills a website; literally

What kills a website... literally

I read this evening a very good article by Jonathan Howell on the Five Things That Will Kill Your Site (unfortunately, in 2014, the page no longer exists and so I have removed the link).

It was a pertinent and very relevant article for me because I have recently gone through a protracted process following the failure of a client’s website, and our struggle to wade through the range of mitigating factors including my own culpability. Certainly, my client’s experience reflected a true comedy of errors on both my client’s behalf and mine, and I have learnt much from the experience.

Before outlining the key takeaways from Jonathan’s article, as well as adding a few of my own, I have offered two other suggestions from my recent experience. These are provided to assist any web developer, technology team or client in the position of a struggling website, to find the quickest path through to resolution.


Stay calm and collected

As the father of a 20 month-old (Oliver is now six and a half and a really cool kid!), the first inclination one has when a baby is throwing a tantrum at the least opportune time, is to shout back.

This helps nobody, and only exasserbates and lengthens the problem. The baby was already confused and upset; shouting only doubles the pain for everyone in the room.

If a website fails, proactive and positive management of the issue is foremost. The client is obviously frustrated and upset that the website has failed, immediately reflecting on the ensuing costs of the downtime.

The web developer however, is equally if not more upset. They are ultimately responsible for the uptime of the website, and they are already pannicked that the website is failing. They may be culpable and deserve being shouted at, though while the website is down, shouting achieves nothing. Reciminations must start when the website is backup and not before.

To that extent, web developers must try to empathise with the position of their client and work to set proper expectations, demonstrate that they are making every effort to reach a remedy, and remain calm themselves. The client is paying the bills and it’s the least they deserve.

Obviously, a working relationship is central to this, and both parties must establish one clear and pragmatic goal during the period and ensure that nothing gets in the way: get the website working again.

Of course, being realistic in the first place is of utmost important; the power goes out in the finest buildings in the city. So do websites.

 

Stocktake post development; pre-launch

Bugs are an inevitable part of life; even ardent quality control cannot always identify the bugs found in the real world. One quality control tester cannot replicate the crazy scenarios 2000 users can configure.

This said, quality control is central and cannot be skipped, whatever the inclinations of the web developer or client. Failure on the stage is much worse than the show starting late.

Every web development firm is different, and we start with very particular and consistent standards in terms of how we plan/design our websites, the architecture we built on and the methodologies we adopt in terms of code (re)use, database development and security. For our mission critical websites, we build in additional redundancies and forecast those scenarios and situations that could impact our website; we then deal with these.

The website database is usually the bottleneck. There are others, and a friend of mine, Nick Crawley owns a great product – Page Pulse (I still refer people to Page Pulse in 2014 and Nick is still a legend!) that tests all potential bottlenecks a developer might not be able to; I recommend it to all clients with mission-critical websites.

For most high-bandwidth and intensive websites however, the database will be the weak link and this requires a separate layer of addressing, and one that is rarely required for the standard, ho-hum corporate website.

From time to time my firm employs DBAs (Database Administrators), though these are for specific and usually highly complex database projects. Most website databases are not that complex and do not need specialists at the database level; a competent web developer should be able to do the work. Such specialists however provide the level of addressing that high-performance websites require, and so outsourcing can be one answer. It is what we have done on several occasions when we have not had the resources employed internally.

As a designer deep down, database optimisation is almost a voodoo, and it rarely sees any change to the fundamental website or database itself. It can, though if your code is competent and logical, it rarely does. Don’t view database optimisation as a failure of the web developer; it is just a step low-traffic websites (see most websites) don’t need to take.

 

Jonathan's List

1. Change

As Jonathan puts it, if a website was just fine yesterday, and it’s dead today, it is very likely that you have changed something.

This could be an upgrade to the website, a new installation on the server or a different network configuration. However minor, changing an 1 to a 0 can make all the difference, and retracing your steps can provide the answer.

To this extent, Change Control is central. This extents not only to testing all changes in the appropriate environment, though ensuring that all changes are fully documented. A good Change Plan will also document how to reverse the change back to the former, working version. To this extent, making changes to the website, one at a time will allow for far better identification of what change went wrong.

 

2. Unexpected Load

This one is simple.

If you send more traffic to the website than was expected and could handlde, the website will be unable to handle the load and will fail.

There are a range of approaches to handling this, and one of them includes improving the capacity of the website through different means, though because unexpected loads are by definition – unexpected – having piles of servers lying around in wait is not necessarily the answer for most websites.

At the very least, understand the capacity of the website. How many sessions can it handle before it fails?

If nothing else, such an understanding also allows for the identification of any issues or bottlenecks, and for these to be appropriately dealt with.

Jonathan makes the very valid point, and one I make all the time, that if you know a load is coming, communicate with your web development firm, and vice versa. There are avenues that can be taken in a reasonable period – even stop gap methods – that can provide relief to whatever extent. If you are driving the excessive load, see if there is anyway to mitigate it in terms of spreading it over a longer period and therefore, reducing the peak demands on the website. Ticketek needs everyone to know immediately that AC/DC has gone on sale, though if time is not of the essence in terms of your offer, spread your campaign over time. Does it matter if I enter the competition in two day’s time and not today?

Finally, ‘degrade gracefully’ as Jonathan puts it. If the demand is simply going to outstrip supply, limit the number of sessions on the website so at least those who can get on, receive a good experience. This is quite feasible and we have implemented it with a number of clients, in a number of ways. It is not ideal, though it is better than nobody being able to access the website.

 

3. Slow Death

Memory leak is the bane of web developers. In the days of ASP, unclosed sessions were the monsters that reared their ugly heads and caused untold grief to all involved.

Modern languages – I develop in C# .NET – are far improved on the old days, though that is not to say that websites cannot slowly die due to memory leak, emptying disk space and so forth.

Prevention is better than cure, and monitoring is a safe way to stay on top of the obvious.

 

4. Time-related Problems

This has affected me once or twice, though that is once or twice too many.

Websites referencing incorrect times and dates can cause havoc. Daylight Savings is the obvious one, and you don’t see it coming until its come. Whole hours and days of bookings can be lost, and in my experience, it is not the first thing you think of. I live in Australia so it’s DD/MM/YYYY, not MM/DD/YYYY.

Jonathan also cites license expiry under his heading of time related problems; I see this quite a bit with SSLs, though we had one client who had their website built on a demo of a CMS by another developer and wondered why it suddenly packed it in.

 

5. Hardware Failures

Hardware breaks. It is usually very easy to identify such breakage and even Windows – via the Blue Screen – will tell you when something is up.

Redundancy is the key to mitigating the effects of hardware failure. It does need to be setup correctly (what’s the use of having a spare DB server if you can’t restore to it) though if done so, it can also provide much needed scalability when the traffic hits.

This list might seem slanted towards the web developer and to an extent, it is. The client however can help themselves by at least understanding the difficulties relating to website failure, and working towards it.

Touch wood it doesn’t happen, though shit does, and planning for the worst case, lest not putting in the steps to avoid it in the first place is paramount.