The performance of eCommerce: optimising your eCommerce website for speed

When many traditional businesses start the process of getting a website up and running which allows customers to purchase their products they start with what I refer to as the ‘Grand Illusion’. They have the illusion that users will happily invest a lot of time to understand their site and that users will see every page and pay attention to everything on every page that they do see. Typically these grand illusions introduce many barriers for users who just want to purchase whatever the business is selling. Barriers come in all shapes and sizes but they come back to one central theme; User Experience.

If users have to stop and ask themselves “How do I use this site? I just want to buy something and I don’t know how to do that” then you’ve lost a great many people except the ones with the persistence to endure a poor experience in order to get whatever it is that you are selling.

That brings me to one of the central User Experience factors that many eCommerce websites tend to forget or leave until the last minute: Performance. How many times have you heard about a deal on a daily deals site or sales on a popular e-commerce website but when you attempt to view it you are presented with a white page for a minute before you’re presented with an error? Despite the success of those websites, they have not sold as much as they possibly could have because people left the site when they couldn’t use it.

Having each page take over a minute to load is extremely detrimental to the user experience and will result in a huge amount of lost purchases. As a developer, I take it upon myself to ensure this barrier is as small as possible by developing a performance strategy for each site that I build. A performance strategy consists of multiple parts and utilises knowledge of the technology stack I use day to day, combined with projections (often bloated) on traffic and data amounts (X blogs per month, Y users per day, etc) to form an understanding of what needs to be done in order to keep page loads as quick as possible.

My performance strategy is bottom-up, meaning it starts at the database and goes through each tier until it reaches the client side. Each tier requires an attention to performance and an understanding of the role that it plays in each request made to the server. These are tiers as I visualise them, not as defined in any official application tier structure.

First Tier: Database

When designing the database it’s essential to understand the data that you will be storing. Will the field require 1000 characters or would 150 characters be sufficient? Do we actually need to keep a reference to the user who created a blog as well as the user who modified it, or do we just not care about that level of detail? Database design decisions are based on the requirements of the site, more involved sites require more involved database design, but that doesn’t mean that you have to be wasteful about it.

Ensure field lengths are only what are required, don’t introduce columns with foreign keys to tables which you just don’t need. Don’t add indexes until you know they provide a positive difference. Don’t store denormalised data until you know it’s actually required. Keep it simple, keep it clean. The most common performance problems at the database stem from developers simply not knowing what data will be put in the system. As part of the performance strategy it’s critical to find this out as early as possible. Don’t design a database to handle 3 new blogs per day when in reality it’s going to receive 140 new blogs per day. Be sensible and ask the right questions up front.

Second Tier: Data Access

Data retrieval is one of the most time-expensive tasks during page loads. If developers do not pay attention to how data is being retrieved the site will quickly grind to a halt, losing money in the process. My current favourite data access method is to use Entity Framework 4.1, Database First model. This means that I import an existing database schema (simple and clean, as described above) into the Entity Framework and then go from there. There are good ways and bad ways to use Entity Framework and a lot of developers don’t know the difference. Entity Framework is not a magic bullet and will easily introduce more problems than it solves if used incorrectly.

The two main items to watch for are proper data context scoping (not per call, not per application) and not using lazy loading (better yet, disable it inside the Entity Framework model properties). Most data access methods for the average site will be retrieval operations; GetBlog(), GetUser(), GetSomethingBySomething() etc. These calls should only return the smallest viable subset of data that is required. Use optional paramaters so that the caller can nominate what additional information they require (to be .Include()ed in the Entity Framework call, to avoid Lazy Loading).

Third Tier: Application Logic

Second to Data Access, application logic layers can be very time consuming depending on their structure. Black box CMS’s tend to have quite thick layers between the database and interface logic which take a lot of time, relatively, because they are designed to be very flexible, arguably too flexible. Typically the methods perform one logical action and more often than not, interact with the data access layer so it’s important to structure them in a way that does not make unnecessary calls into the data access layer.

Equally important is to ensure that their usage is consistent which means that they behave the same way every time they are called. Inconsistent performance characteristics will skew the developers view of the method and can unintentionally push the page time up. Any calls which are made need to be kept as simple as possible in order to allow them to be extended easily later on in development if necessary. While this isn’t strictly performance related it’s still very good practice to do so.

Fourth Tier: Interface logic & interaction

This layer is typically responsible for accepting user input, applying logic, and setting the state to be sent back to the user. This tier is very important as it interacts with HTTP requests and invokes code to perform expensive tasks. For a developer, it is important for the developer to treat every call into the application logic layer as very expensive, even if it’s not, so that they are smarter about how and when they make those calls. Complacency breeds mediocrity.

Fifth Tier: Http Request/Responses

Care should be taken not to perform too many requests. Each request utilises potentially slow resources (Network I/O on the client, processing & data access on the server, network I/O on the server to send the response back) and so any data that is required on the client side should be stored in-page where it makes sense.

Care should be taken to ensure that Ajax requests don’t return a giant blob of HTML. Instead, return some compact JSON data and construct any necessary interface elements on the client. Redirecting users from one action to another using HTTP redirects is wasteful as it involves more network I/O (relatively slow) to accomplish the same task as just redirecting the request inside the logic layer.

Sixth Tier: Client Side (Html/Javascript)

Client-side performance is every bit as important as server side performance. Reducing the number of HTTP requests that the browser has to make per page load is critical, this means using methods such as CSS sprites (combining multiple images into one larger image and then using CSS background positioning to only show the image you want), javascript include combination and CSS include combination.

Ensuring your static content (CSS, Javascript, Images) have the correct ETag and Cache expiry settings is also important as warm loads (page loads after the browser has cached content on the site) will typically involve much fewer HTTP requests and content fetching which further increases the responsiveness of the site.

 

Thresholds

Over the years I’ve grown to understand web application performance and based on the performance strategy I briefly described above I’ve worked out some personal thresholds that I use to determine how a site is performing.

Item

Excellent

Okay

Not Ideal

Time to generate page on server

<50ms

51ms to 150ms

151ms to infinity

Overall page load time

<200ms

201ms to 600ms

601ms to infinity

Average HTTP requests per page load

<10

11 to 40

41 to infinity

Average Page Download Size

<500KB

501KB to 1024KB

1025KB to infinity


These aren’t figures to be applied to all websites everywhere because some business requirements can increase the page load times unavoidably. Also, some flexible CMS systems have inherent page load times before you even start using them, some perform better than others. The figures above are designed to be thresholds for sites built in ASP.NET WebForms or ASP.NET MVC. Other technology stacks will have their own set of performance characteristics which are not represented in the above table.

Caching

I typically see caching used as a crutch by developers who either don’t understand the performance of their application or who are unable to control the performance of the application (in the case of black-box CMS systems). Caching used correctly is very effective; caching used incorrectly can be detrimental to the day to day use of the application.

There are a couple of different types of caching, some occur at the client, some at the server and some at the data access layer. My performance strategy tends to utilise opt-in in memory caching at the Data Access layer in order to prevent expensive database I/O. But even this approach has some issues. When utilising a CMS which manages the same data that you are caching you have to be mindful of changing data. Site owners will typically not tolerate a 1-5 minute delay when they save data in the CMS before it appears on the front end. To address this I use a connector system, when the CMS saves a piece of data, it sends an async request to the website asking it to invalidate an item in a particular cache set with that ID. In practice this works very well, providing well-cached data entities with immediate updating of changed data and the mechanisms are all very simple and secure.

Further on this, I maintain reporting which is polled from the website so that I can monitor the individual cache sets and their hit/miss ratio. A cache hit means a request was made for an item that was in the cache, which is good. A cache miss is when a request was made for an item that was not in the cache and so it has to fetch it from the source, which is bad (in terms of performance). Using the reported ratios it’s possible to tweak the cache timeout values in order to improve the hit/miss ratios, increasing the performance of the site in the process. Reporting and metrics are essential in determining if your site is performing optimally.

The use of a cache as I’ve described can have a very dramatic impact on some pages. Recently I watched a page drop from 89ms with 6 database calls drop to 18ms with 0 database calls on a warm cache. Since this page is the homepage it’s even more important as it sets the tone for what the user will experience throughout their visit. I those numbers are very small, but the difference is huge. At 89ms per page, a server can serve 674 sequential pages per minute. At 18ms, it can serve over 3,330 sequential pages per minute. If a website was running at its maximum capacity, that’s roughly 2,656 more chances per minute for users to purchase something.

Note: I’m using the word “sequential” because web servers are inherently multi-threaded, serving many requests at the same time. Real-world pages-per-minute values will be much higher than the sequential figures I state above.

Last notes

It’s important to regularly profile, or at least load test, your web application during development. As soon as it exhibits performance outside of your expected threshold you need to examine why and then adapt your performance strategy to rectify the issue. In my experience, if you leave issues until later in the development cycle they typically do not get addressed, or if they do, they do not have an outcome as good as if they were addressed earlier on. This does not apply to caching however as caching should be done late in the project once the behaviour of the users is known and the most-used paths can be cached.

Users don’t like using slow sites and slow sites only serve to turn users away from what could otherwise be a successful purchase.

Performance is a feature