Maneater/Django development lessons, Part 3

Through a series of posts, I’m counting down to a public test of the new Maneater web site by the end of the weekend. We’re hoping to launch Tuesday.

Last post was about my first impressions of Django as the project started off. This is part three.

This is mostly a technical writeup. For those that aren't programmers, the final post (Part 4) is a general "layman's terms" overview of new features on the site.

---

I'm really going to gloss over the templates and views more than I'd like. Honestly, writing the views was more of a Python learning experience than a Django learning experience.

Learning from the past

The trouble we had in prior launch attempts was mostly due to overloading the Apache/Python instance on the server and overloading the database -- some of the more complex views involved lots and lots of database queries that would pull all columns from a set. Indexing helped a little bit, but lacking an easier method in Django, a few of the views I created relied heavily on subqueries. The logic around some of these views would bring the server down to a crawl. We have over 27,000 articles in our database, and some views would iterate over every single article several times before printing to a template.

Another limitation of the previous project was that we couldn't update Django to use the latest code and features. The backwards-incompatible changes I mentioned before sort of set us back in this regard, since caching wasn't nearly as extensible or quick in the early version of Django that was used. (Luckily, the API is now pretty stable, so the chances for backwards-incompatible changes at this point are low.)

Basically: everything about the old Django project was slow, and this was partially due to the old version of Django we were using and it was partially due to the code itself being poorly optimized.

---

Caching, as a lazy performance improvement

In an environment such as a newspaper website -- where, after a certain point, your content doesn't continue to update -- most pages and data don't change for a very, very long time (if ever). New stuff just gets added and old stuff is assumed to be static and archived.

In Django, you can cache anything. Caching more or less means you're putting your data somewhere that's quicker to access than where you normally get it. For example, leaving some documents on your desk so you can look at them later, as opposed to filing them back in a drawer. With Django, you can cache data (obviously) and the objects you get out of your database. And since you're normally caching in memory, the result ends up being extremely fast.

This is extremely beneficial in those situations above, where queries are server intensive and slow. Say you have a page that does this:

def some_function(self):
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff that takes a lot of server power
#...
#in this example, i'm theoretically
#narrowing down articles & staffers
#...
data = {
'articles':articles,
'staffers':staffers
}
#...
#do stuff to data
#...
return data

Which is fine, but if you use some_function a lot, your server starts to slow down quite a bit.

Let's say that it's safe to assume that the data doesn't change all the time. You can cache the "data" object by doing the following:

def some_function(self):
cachename = 'some_function_asdfasdfasdf'
data = cache.get(cachename)
if data==None:
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff ...
#...
data = {
'articles':articles,
'staffers':staffers
}
cache.set(cachename,data)
#...
#do stuff to data
#...
return data

If you haven't done this function before (or the last cache expired), then it does the stuff you'd normally do to get "data". If you have it in the cache, then it skips over the database-intensive work, which makes it fast, and makes your server happy.

The default cache time is 5 minutes, and is modifiable in the settings. If your data doesn't change for an unusually long time, you can also set a specific time for a cache:

cache.set(cachename,data,60*60*24)
Caches your data for 24 hours (cache time is set in seconds).

Caches work by unique cache name, so by making "cachename" dynamic you can even cache very specific pages if you deem it necessary. (For example, I cache staffer profile pages on a per-person basis, because it uses lists of every article, photo, graphic, page design, and podcast credited to a given person.)

As a programmer, I considered my stopgap object-caching a very lazy way to optimize, but due to the nature of some of the data models and due to the sheer volume of information we have to go through (27000+ articles, thousands of photos, hundreds of staffers), there really isn't much I can do at this point on a deadline. And sometimes, because of the limitations of a language or framework, there may not be an easier way to optimize anyway.

---

I'm not going to touch templating that much here (though I may come back to this subject later to talk about "abstracting" templates and creating custom templates for RSS feeds and Javascript). I will say that as long as you can create basic HTML/CSS and know that "hey, this variable goes inside this tag" (i.e. <h1>{{title}}</h1>), you can more or less create a template for Django. Since most of the logic comes at the view level, you really don't need to know Python as much as simply understand programming basics like variables and for loops. (And of course you'll need to know what variables are coming at you from the view level, and you'll need to know the data models your site uses.)

It's my understanding that Ruby on Rails (another framework) uses some form of Ruby as it's template language. I love Django for the very plain text-based approach.

---

Next: I'll link you to the new site, walk you through the new features, and solicit your comments. I'll also outline a few new things I'm hoping to accomplish on the site, after polishing out any post-launch issues. Stay tuned.

Maneater/Django development lessons, Part 2

Through a series of posts, I’m counting down to a public test of the new Maneater Web site by the end of the weekend. We’re hoping to launch Tuesday.

In the first part, I introduced the project and talked about the history. This is part two.

This is mostly a technical writeup. For those that aren't programmers, the final post (Part 4) is a general "layman's terms" overview of new features on the site.

---

Square One

In late December, frustrated with a poor development workflow in Drupal, I mulled it over with Carolina Astrain (our online editor) and decided to go back to Django. I did this on the condition that I could fashion a functional demo in four days or go back to the Drupal project. I promised to have the site (regardless of platform) done by February. A big dare on my part, but necessary in my book: I needed hard deadlines for results. This was getting done whether I liked the workflow or not.

Over the course of hosting switches for the Django and Drupal projects, all of the old code from the prior Django project was lost. But this didn't really matter because Django -- still in development -- had gone through several major revisions and backwards-incompatible changes from the version Chase and Brian had worked on.

Again, this was fine because the site barely worked in that incarnation -- we nearly rewrote it from the ground up at one point, until the project stagnated.

So, I started the arduous task of creating a full-fledged news outlet Web site from nothing. Did I mention that I didn't know Python and that I was in way too deep the last time I worked on the old Django project? Yes, I was learning this on the fly. (I actually wish I hadn't lost Chase and Brian's code; I did not understand a bit of it back when I was responsible for it, but I believe I could learn a lot from that now.)

Django is a framework -- not so much as a "program everything" programming language as much as a programming language that does a lot of your workflow for you. Having such a versatile framework environment is probably the only reason I was able to do this with such little help.

---

And we're off

Django uses a "design pattern" (workflow) that is split up into three "layers". Most frameworks are. Django uses a very pure three-tier method: model, view, template. Or in layman's terms: what sort of information you have in your database, the logic that determines what stuff to get from your database and processes it, and the templates that take that "view" and turn it into pages, feeds, and other files. (For those programmers that know of the "Model-View-Controller" pattern, the "view" represents your controller in doing the logic, and the "template" represents your view.)

I think it's a nice system, because you can delegate your tasks based on those layers and so long as your team members know what to expect from people working on another "layer", they don't need to know the innards of the other layers.

But of course, I'm working on this more or less on my own (save for some templating help from Justin Myers) so that layer separation doesn't necessarily save me except for the fact that it's really well-suited for learning on the fly. Think about it: instead of worrying about every aspect of a site at once (overload!), you can focus on and learn specifically how to model your data and finish that, focus on how to create views for your data and then finish that, and then learn the nitty-gritty on templating.

Let me tell you one thing, though: Frameworks like this are designed to be quick-to-build. It's a framework, in literal terms, where you build off of a mostly prefabricated structure. I can imagine designing fast workflows around this, with the ability to churn out very diverse applications (don't just copy and paste the same code between projects) at the drop of a hat. I'd be happy to learn a few other frameworks and use another MVC/MTV pattern in the future.

---

Next Top Model

The beauty of a framework like Django is the data modeling features. If you've used an entity-relationship diagram (ERD), you already know more about this than is necesary. If you can list out the types of metadata for something...

  • Issue
    • Volume (Integer)
    • Date (Date)
  • Staffer
    • First Name (Text)
    • Last Name (Text)
    • Position (Text)
  • Article
    • Title (Text)
    • Date (Date)
    • Issue (Issue)
    • Section (Text)
    • C-Deck (Text)
    • Byline (Staffers)
    • Body (Text)
  • Photo
    • Date (Date)
    • Byline (Staffer)
    • Photo (Image)
    • Cutline (Text)
  • ...you more or less already have your database set up and an admin interface automatically made for you. This example is a gross simplification of the Maneater's basic data models. (If you're really interested in learning what models look like in code, see the Django Book.)

    The best part about Django's data models is fields that represent images you upload (for photos and graphics) and regular files you upload (for PDF pages and podcast audio files). The admin panel automatically takes care of uploading files to a folder you specify.

    That example admin interface uses different example data, but visually it's the same admin panel the new Maneater site uses. And I'm assuming that the Columbia Missourian and Vox Magazine admin interface probably looks the same too.

    The automatic admin interface is one of the single most important features of Django. Believe me, it takes forever to program your own way of adding, modifying, and removing stuff from a custom database.

    Taking that CS 3380 database class at the same time as this project also propelled me along quite a bit. I cannot stress enough the importance of good, connected data models, and that class is where I got it pounded into my skull. Of all the things in the Django framework, your data models really can't be edited after the fact without massive pain -- i.e. manually editing your database. It's better to have more metadata than you need than to realize later that you left out some information.

    Create a very full outline of the things you plan on putting into the site and every bit of information that goes under them. I don't think I could start a major project in any language without doing that sort of visualization now.

    ---

    Of course, data modeling and having an admin interface doesn't give you a site that's browsable.

    Next: "Views" and templating.