Maneater/Django development lessons, Part 3

Through a series of posts, I’m counting down to a public test of the new Maneater web site by the end of the weekend. We’re hoping to launch Tuesday.

Last post was about my first impressions of Django as the project started off. This is part three.

This is mostly a technical writeup. For those that aren't programmers, the final post (Part 4) is a general "layman's terms" overview of new features on the site.

---

I'm really going to gloss over the templates and views more than I'd like. Honestly, writing the views was more of a Python learning experience than a Django learning experience.

Learning from the past

The trouble we had in prior launch attempts was mostly due to overloading the Apache/Python instance on the server and overloading the database -- some of the more complex views involved lots and lots of database queries that would pull all columns from a set. Indexing helped a little bit, but lacking an easier method in Django, a few of the views I created relied heavily on subqueries. The logic around some of these views would bring the server down to a crawl. We have over 27,000 articles in our database, and some views would iterate over every single article several times before printing to a template.

Another limitation of the previous project was that we couldn't update Django to use the latest code and features. The backwards-incompatible changes I mentioned before sort of set us back in this regard, since caching wasn't nearly as extensible or quick in the early version of Django that was used. (Luckily, the API is now pretty stable, so the chances for backwards-incompatible changes at this point are low.)

Basically: everything about the old Django project was slow, and this was partially due to the old version of Django we were using and it was partially due to the code itself being poorly optimized.

---

Caching, as a lazy performance improvement

In an environment such as a newspaper website -- where, after a certain point, your content doesn't continue to update -- most pages and data don't change for a very, very long time (if ever). New stuff just gets added and old stuff is assumed to be static and archived.

In Django, you can cache anything. Caching more or less means you're putting your data somewhere that's quicker to access than where you normally get it. For example, leaving some documents on your desk so you can look at them later, as opposed to filing them back in a drawer. With Django, you can cache data (obviously) and the objects you get out of your database. And since you're normally caching in memory, the result ends up being extremely fast.

This is extremely beneficial in those situations above, where queries are server intensive and slow. Say you have a page that does this:

def some_function(self):
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff that takes a lot of server power
#...
#in this example, i'm theoretically
#narrowing down articles & staffers
#...
data = {
'articles':articles,
'staffers':staffers
}
#...
#do stuff to data
#...
return data

Which is fine, but if you use some_function a lot, your server starts to slow down quite a bit.

Let's say that it's safe to assume that the data doesn't change all the time. You can cache the "data" object by doing the following:

def some_function(self):
cachename = 'some_function_asdfasdfasdf'
data = cache.get(cachename)
if data==None:
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff ...
#...
data = {
'articles':articles,
'staffers':staffers
}
cache.set(cachename,data)
#...
#do stuff to data
#...
return data

If you haven't done this function before (or the last cache expired), then it does the stuff you'd normally do to get "data". If you have it in the cache, then it skips over the database-intensive work, which makes it fast, and makes your server happy.

The default cache time is 5 minutes, and is modifiable in the settings. If your data doesn't change for an unusually long time, you can also set a specific time for a cache:

cache.set(cachename,data,60*60*24)
Caches your data for 24 hours (cache time is set in seconds).

Caches work by unique cache name, so by making "cachename" dynamic you can even cache very specific pages if you deem it necessary. (For example, I cache staffer profile pages on a per-person basis, because it uses lists of every article, photo, graphic, page design, and podcast credited to a given person.)

As a programmer, I considered my stopgap object-caching a very lazy way to optimize, but due to the nature of some of the data models and due to the sheer volume of information we have to go through (27000+ articles, thousands of photos, hundreds of staffers), there really isn't much I can do at this point on a deadline. And sometimes, because of the limitations of a language or framework, there may not be an easier way to optimize anyway.

---

I'm not going to touch templating that much here (though I may come back to this subject later to talk about "abstracting" templates and creating custom templates for RSS feeds and Javascript). I will say that as long as you can create basic HTML/CSS and know that "hey, this variable goes inside this tag" (i.e. <h1>{{title}}</h1>), you can more or less create a template for Django. Since most of the logic comes at the view level, you really don't need to know Python as much as simply understand programming basics like variables and for loops. (And of course you'll need to know what variables are coming at you from the view level, and you'll need to know the data models your site uses.)

It's my understanding that Ruby on Rails (another framework) uses some form of Ruby as it's template language. I love Django for the very plain text-based approach.

---

Next: I'll link you to the new site, walk you through the new features, and solicit your comments. I'll also outline a few new things I'm hoping to accomplish on the site, after polishing out any post-launch issues. Stay tuned.