Saving ubuntu.com on download day: caching location specific pages

(This article is was originally posted on design.canonical.com)

On release day we can get up to 8,000 requests a second to ubuntu.com from people trying to download the new release. In fact, last October (13.10) was the first release day in a long time that the site didn’t crash under the load at some point during the day (huge credit to the infrastructure team).

Ubuntu.com has been running on Drupal, but we’ve been gradually migrating it to a more bespoke Django based system. In March we started work on migrating the download section in time for the release of Trusty Tahr. This was a prime opportunity to look for ways to reduce some of the load on the servers.

Choosing geolocated download mirrors is hard work for an application

When someone downloads Ubuntu from ubuntu.com (on a thank-you page), they are actually sent to one of the 300 or so mirror sites that’s nearby.

To pick a mirror for the user, the application has to:

  1. Decide from the client’s IP address what country they’re in
  2. Get the list of mirrors and find the ones that are in their country
  3. Randomly pick them a mirror, while sending more people to mirrors with higher bandwidth

This process is by far the most intensive operation on the whole site, not because these tasks are particularly complicated in themselves, but because this needs to be done for each and every user - potentially 8,000 a second while every other page on the site can be aggressively cached to prevent most requests from hitting the application itself.

For the site to be able to handle this load, we’d need to load-balance requests across perhaps 40 VMs.

Can everything be done client-side?

Our first thought was to embed the entire mirror list in the thank-you page and use JavaScript in the users’ browsers to select an appropriate mirror. This would drastically reduce the load on the application, because the download page would then be effectively static and cache-able like every other page.

The only way to reliably get the user’s location client-side is with the geolocation API, which is only supported by 85% of users’ browsers. Another slight issue is that the user has to give permission before they could be assigned a mirror, which would slightly hinder their experience.

This solution would inconvenience users just a bit too much. So we found a trade-off:

A mixed solution - Apache geolocation

mod_geoip2 for Apache can apply server rules based on a user’s location and is much faster than doing geolocation at the application level. This means that we can use Apache to send users to a country-specific version of the download page (e.g. the German desktop thank-you page) by adding &country=GB to the end of the URL.

These country specific pages contain the list of mirrors for that country, and each one can now be cached, vastly reducing the load on the server. Client-side JavaScript randomly selects a mirror for the user, weighted by the bandwidth of each mirror, and kicks off their download, without the need for client-side geolocation support.

This solution was successfully implemented shortly before the release of Trusty Tahr.

By @nottrobin