Caching

Caching can reduce the load on servers by storing the results of common operations and serving the precomputed answers to clients.

For example, instead of retrieving data from database tables that rarely change, you can store the values in-memory. Retrieving values from an in-memory location is far faster than retrieving them from a database (which stores them on a persistent disk like a hard drive.) When the cached values change the system can invalidate the cache and re-retrieve the updated values for future requests.

A cache can be created for multiple layers of the stack.

Caching backends

  • memcached is a common in-memory caching system.

  • Redis is a key-value in-memory data store that can easily be configured for caching with libraries such as django-redis-cache and the similarly-named, but separate project django-redis.

Caching resources

  • Caching at Reddit is a wonderful in-depth post that goes into detail on how they handle caching their Python web app for billions of pageviews each month.

  • "Caching: Varnish or Nginx?" reviews some considerations such as SSL and SPDY support when choosing reverse proxy Nginx or Varnish.

  • Caching is Hard, Draw me a Picture has diagrams of how web request caching layers work. The post is relevant reading even though the author is describing his Microsoft code as the impetus for writing the content.

  • While caching is a useful technique in many situations, it's important to also note that there are downsides to caching that many developers fail to take into consideration.

  • Caching at Reddit covers monitoring, tuning and scaling for the very high scale Reddit.com website.

  • Mastering HTTP caching provides more advanced advice on caching dynamic as well as static content via CDNs and other configurations.

Caching learning checklist

  1. Analyze your web application for the slowest parts. It's likely there are complex database queries that can be precomputed and stored in an in-memory data store.

  2. Leverage your existing in-memory data store already used for session data to cache the results of those complex database queries. A task queue can often be used to precompute the results on a regular basis and save them in the data store.

  3. Incorporate a cache invalidation scheme so the precomputed results remain accurate when served up to the user.

What do you want to learn next for your deployment?