django-medusa: Rendering Django sites as static HTML

If you’ve ever poked around on this blog, you may have noticed the colophon which mentions very briefly:

Unlike most Django sites, this is compiled into static HTML pages by django-medusa, a Django app I'm currently building.

This little tool has been open-source since I deployed to the new version of this website, maybe nine months ago, but I hadn’t really done anything with it or mentioned it much anywhere. It powers the several hundred pages on this site and turns them into static HTML — which is then hosted in S3. (Details below.)

(The only other time I’ve mentioned this project publicly was in response to django-bakery, a tool that the L.A. Times Data Desk uses to process some data projects into static pages. Clearly, this is an interesting idea to some people.)

tl;dr for Django pros: Test out the tutorial “hello world” and see the README. Come back if you want the more detailed narrative breakdown of the app (and how the app powers this blog).

The app basically auto-finds “renderer” definitions for your apps and then provides a Django management command that builds the static rendition of the website (with the output directed based on some settings).

Renderers live in files that are auto-discovered — like and, it’s auto-discovered as long as the app is listed in INSTALLED_APPS. This basically defines a class that defines a get_paths instance method that returns a list/tuple of absolute URLs, representing all the URLs that should be converted to static files. Renderers are set up like this so that, on an app-by-app basis (or even varying within an app), you can dynamically generate all the possible URLs that exist in your site.

Here’s a couple renderer definitions that actually power part of this site.

The specific URL names and model bits aren’t important: basically, you’ll notice that the example BlogHomeRenderer in my example generates the entire URL structure for /blog/* by querying for all live blog posts and then using Django’s URL reversing methods to figure out all the paths that could possibly be built. (That file in particular uses sets instead of lists/tuples, so that it can just blindly generate all the URLs and have duplicates ignored. It casts the set to a list upon returning.)

The process that actually generates the output simply uses (or abuses) Django’s internal testclient to request each URL and store the resulting data (and mimetype/other metadata, if using the right backend — I’ll touch on this more, below). I believe that this paradigm provides the most flexibility regarding giving each app the ability to define it’s own outputs and it keeps app-and-view-building as Django-like as possible (i.e. you are still building within the urlconf and view system). It seems ghetto at first to rely on those internal HTTP testclient mechanisms, but I haven’t yet encountered any issues — the rendering command can even (optionally) parallelize the testclient crawl to achieve faster rendering.

The staticsitegen management command then renders the URL structure you’ve defined, into static files. There are currently three rendering classes:

  • a disk-based renderer which outputs the directory tree in HTML files, turning bare URLs into directories with an index.html (so /colophon/ would result in output_dir/colophon/index.html being generated)
  • an Amazon S3 renderer which uploads the files directly to an S3 bucket (overriding duplicates)
  • a Google App Engine renderer which uploads the files to a static GAE instance, similar to the S3 renderer’s behavior

The advantages of the latter two primarily deal with situations where non-HTML content is generated: if any of your views returns JSON, XML, or some other data format, then the S3/GAE renderers will attempt to store the generated files with that mimetype.

This blog basically runs on a local dev server that uses an SQLite database as storage. I use the S3 renderer for this, to cut out the filesystem middle-man. For static files, I use the built-in staticfiles app along with django-storages; the collectstatic command automatically uploads static resources to S3, the same way the medusa S3 renderer does.

I write everything on that local server, then use my staticsitegen command to upload the whole URL tree. (In the event I updated any template bits, every URL is re-generated and overwritten.) I then use collectstatic to sync my static (CSS/JS/etc.) files. (For CloudFront or EdgeCast CDNs, right around here is where I’d run a script to immediately invalidate some of the more recent URL roots so that blog index pages get refreshed faster.)

That’s basically it. This blog (and the Onion Browser site, which are simply implemented as direct_to_template views) have been running via this system for about nine months and it’s been solid. Not even the Onion Browser release rush (being featured on Hacker News, Reddit, Gizmodo, Lifehacker, etc. etc. etc.) affected the site in the slightest.

Despite the static nature of the underlying pages, I don’t lose the ability to have comments, stats, and other features: I had been running Disqus for comments for quite some time, and I still use Google Analytics for analytics. (I recently disabled comments on this site; mainly out of apathy and lack of use than any philosophical stance.)

The README has a pretty good technical overview that goes into more code detail than the above paragraphs. You might want to start with this basic “hello world” tutorial first, though — it’ll get your feet wet and demonstrate the ease at which you can convert a (simple) Django project to become a statically-generated Django project, by simply adding a renderer definition and some settings.

I’m planning to clean the code a bit and tidy up (and add to) the documentation in the next couple weeks, but I figured I’d been sitting on this long enough.

You can discuss this on Hacker News. Feel free to bug me via e-mail or follow me on Twitter, too.