Notice: I have neither posted nor updated any content on this blog since the mid 2010's. (😱) Please check out the homepage and my bio, for more recent information.

PyPy on Heroku

WARNING: This is an old post from 2012. The information here is likely incredibly out of date. Proceed at your own risk.


A few people (including the main @Heroku account on Twitter) have mentioned that I’ve thrown together a working Heroku buildpack for PyPy. (And hey, PyPy 1.8 just dropped last Friday so you should check that out.)

I noticed a couple other attempts that were broken (with an 'import site' failed error) and did some sleuthing on my own deployed-but-broken PyPy/Heroku site: did you know you can run arbitrary commands by using heroku run? I dug around (via heroku run "ls -la *") and tinkered with a buildpack of my own and discovered that the include dir was being symlinked (not actually copied) and the lib-python and lib_pypy directories contained symlinks rather than actual .py files. (This is possibly an issue with virtualenv --relocatable and PyPy.)

I fixed the buildpack by removing those directories and copying them manually after the virtualenv is set up.

(Aside: I’ve been toying with Heroku in my spare time for a mere couple weeks now, but the Cedar Buildpack system is a really awesome concept and really caught my eye: you can essentially use them to deploy your own arbitrary projects and languages to the Cedar stack.)


In any case, this buildpack is more or less a drop-in replacement for the default Heroku one. When creating a Heroku app, you can enable it like this:

heroku create --stack cedar --buildpack git://github.com/mtigas/heroku-buildpack-pypy.git

Or, modify an existing Python/Heroku app by using:

heroku config:add BUILDPACK_URL=git://github.com/mtigas/heroku-buildpack-pypy.git

…and then re-deploying your app (by pushing).

Enjoy your easy to set up, cloud-hosted PyPy!

Deploying Django on Heroku

WARNING: This is an old post from 2012. The information here regarding configuration & deploy is likely incredibly out of date. Proceed at your own risk.


Extremely long tutorial post ahead.

If you want to follow along, I’ve made an example project available here. Said example site is running here.

An extremely abridged tldr version of this (containing just the shell and code snippets, sans explanation) is located here: tldr.markdown.


I've been toying around with Heroku in my spare time over the past couple weeks because "NoOps" is the new hotness and the promise is cool enough: wouldn’t it be great if you could write and deploy a high-performance website without having to micromanage the infrastructure? (See also: erosion and whatnot.)

In any case, the pricing structure of Heroku (750 free hours) is such that you can run a low-end, low-traffic website in Heroku for free, which is useful for trying it out.

The Heroku Django starter doc isn't bad, but leaves out some bits that I think are important in any production environment:

  • Using the correct gunicorn worker class: gunicorn recommends you either run behind a buffering proxy server (i.e. nginx) or run one of the "async" worker classes. The Cedar 'herokuapp.com' HTTP stack directly connects requests to the backend for flexibility (think WebSockets and the like), but doesn’t provide gzip or buffering of requests. (In fact, the Heroku Django docs mention using gunicorn+gevent, but don’t actually configure gevent in the relevant examples.)
  • Handling of static assets and uploaded files. I have seen this question asked in a few places and the combination of Django 1.3+, django-storages, and boto make this extremely painless to set up.

…So I’ve decided to tinker with Heroku and write a step-by-step tutorial as I go.

The following assumes you’re fairly proficient with Django and these steps are only useful for getting a barebones proof of concept site up and running. I do provide the database and caching bits for you, so you can use this as a stepping stone for trying out more full-featured projects on Heroku.

(Note: The free Heroku database is a shared server with only 5MB of raw data storage. The free memcached instance is likewise a tiny 5MB instance. These are toy websites we’re deploying here for free. The 20GB shared database is $15/month; you can also host your own dedicated EC2 postgres instance if you’d rather not go all-out with a Heroku dedicated DB.)

Preliminaries

System dependencies

I’m going to assume you have a working copy of git, Python 2.7.X, pip, and virtualenv on your local system.

If you don't, you should install them via homebrew. If you don't have homebrew, visit the documentation and run the one-line install. (If you do have brew installed, now would be a great time to update it. Run brew update.)

Install git.

brew install git

Now install Python, add it to your PATH, and add that new PATH to .bash_profile so that this works in the future. (If you use ZSH or another shell, do this to your .zshrc or similar file.)

brew install python
export PATH=/usr/local/share/python:$PATH
echo "export PATH=/usr/local/share/python:\$PATH" >> ~/.bash_profile

Then install pip and virtualenv.

easy_install pip
pip install virtualenv

Setting up Heroku

Register a Heroku account first.

Now install the heroku and foreman commands:

sudo gem install foreman heroku
sudo update_rubygems

(Note: The Heroku docs tell you to use their toolbelt package to install these packages, but I’ve encountered errors with foreman unless I’ve sudo gem install'd it. The gems get you the same thing, anyway.)

Once installed, run the heroku login command, which allows you to run commands against your Heroku account. (The Heroku Toolbelt page has an example of the login bit under "Getting Started".)

If you’re keeping score at home, here are the things you need to move on:

  • git
  • python 2.7.x
  • pip
  • virtualenv
  • heroku
  • foreman

Bootstrapping a Heroku Python site

We’ll start by choosing an app name. (Change these values, please.)

# The "app name" that this will get in the Heroku control panel. Also
# determines directory names and your "PROJECT_NAME.herokuapp.com"
# default domain.
export PROJECT_NAME="my-test-app"

# The python module name for your Django site. Separate from above since
# python app names should use underscores rather than dashes.
export PYTHON_APP_NAME="my_test_app"

I like to put my projects in a ~/Code directory, but you can change this to place your projects whever you normally would:

# Set up a heroku-$PROJECT_NAME virtualenv in the ~/Code directory.
cd ~/Code
virtualenv --no-site-packages heroku-$PROJECT_NAME

I’m going to gloss over the fine details on how to use virtualenv, but you should be able to follow along here if you’ve ever done customization to your .bash_profile, .zshrc, or similar shell init file.

# Modify the `activate` file with some sanity-ensuring defaults, like
# ignoring any system-level PYTHONPATH and DJANGO_SETTINGS_MODULE.
cd heroku-$PROJECT_NAME
echo "export PROJECT_NAME=\"$PROJECT_NAME\"" >> bin/activate
echo "export PYTHON_APP_NAME=\"$PYTHON_APP_NAME\"" >> bin/activate
echo "export PIP_RESPECT_VIRTUALENV=true" >> bin/activate
echo "export PYTHONPATH=\"\$VIRTUAL_ENV/repo/src\"" >> bin/activate
echo "unset DJANGO_SETTINGS_MODULE" >> bin/activate

# Activate the environment.
source bin/activate

Now we’re in an isolated virtualenv environment (since we started it with --no-site-packages), and we can pip install to our heart's content and those packages will be installed within this isolated sandbox (since we set PIP_RESPECT_VIRTUALENV).

(If you’ve never used virtualenv before: If you want to open this virtualenv later, just run cd ~/Code/(PROJECT NAME)/ and then source bin/activate.)

Now we’ll start up a repository to store our project and work our way through installing Django and our gunicorn server bits:

# Initialize a git repository in the `repo` subdirectory of this virtualenv
git init repo
cd repo

# Start this git repo with my Python .gitignore of choice.
# See it at https://gist.github.com/1806643/ for notes.
curl -sLO https://raw.github.com/gist/1806643/.gitignore
git add .gitignore
git commit -m "initial commit, .gitignore"

# Create a `src` directory within our repo.
mkdir src

# Install Django (1.3.X), gunicorn (0.13.X), gevent (0.13.X), and the greenlet
# dependency.
echo "django==1.3.1" > requirements.txt
echo "gunicorn==0.13.4" >> requirements.txt
echo "gevent==0.13.4" >> requirements.txt
echo "greenlet==0.3.4" >> requirements.txt
pip install -r requirements.txt

The src directory will be where our Python sources live. It’ll be a place on PYTHONPATH, so root-level modules (and things that aren’t pip-installable) can be placed there. (I prefer this to putting everything on the root level of the repository -- as is done in the Heroku docs -- for matters of keeping a well-organized source tree.)

We’ll set up a plain Django project inside:

# Enter the `src` dir and create a django project
cd $VIRTUAL_ENV/repo/src
$VIRTUAL_ENV/bin/django-admin.py startproject $PYTHON_APP_NAME
cd $VIRTUAL_ENV/repo

Now, we’ll configure a procfile, which describes the processes that will power our services. (Well, just one now for our web service.)

# Unlike the gunicorn defined in Heroku's Django example, we're going
# to use one of the async worker classes, "gevent". Using an async worker class
# is recommended when serving traffic directly to gunicorn (which is what
# happens under the Heroku Cedar stack).
echo "web: gunicorn_django -b 0.0.0.0:\$PORT -w 9 -k gevent --max-requests 250 --preload src/$PYTHON_APP_NAME/settings.py" > Procfile

The web service is special-cased to provide a $PORT environment variable, which is where Heroku will send your web traffic. I’ve set up some sane defaults (9 workers, 250 requests per worker before restarting them) for Gunicorn that you can configure for yourself later.

Now we’ll commit this bare Django project and test it locally.

# Commit everything we have in here.
git add .
git commit -m "base django site"

# Test out our setup.
foreman start

We’re using foreman (man page), which reads the Procfile and simulates running the service on Heroku.

Opening http://127.0.0.1:5000/ in the browser should display the standard "It Worked!" page. Now, let’s try to get this running in the cloud:

# Create a Heroku instance for this site
heroku create -s cedar $PROJECT_NAME

# Make sure to add `src` to the PYTHONPATH on our server. (We added this to our
# local activate file, but it needs to be applied to Heroku, too.)
heroku config:add PYTHONPATH=/src

# Deploy this project to Heroku
git push heroku master

You should now be able to hit http://PROJECT_NAME.herokuapp.com/ and see that the Django instance is running. Some things to try:

  • Check heroku ps to see the status of the processes you have running.
  • See heroku logs to see access or error logs. (heroku logs -t acts like the tail command and sends you a constant stream of log lines.)

Configuring a database and serving static files

Now we've got a website running at http://PROJECT_NAME.herokuapp.com/ that has no database and cannot serve static assets. We’ll work on both, by enabling the admin (since the Django 1.3+ admin site requires staticfiles and Users set up in the database).

You can add a free, shared database account to your Heroku app by running this command:

heroku addons:add shared-database:5mb

If you run heroku config you’ll see the DATABASE_URL, which contains your database’s username, password, hostname, and database name. (We’ll be using this environment var to configure our database in Django shortly.)

For static storage, I’m going to use boto and django-storages to store files in Amazon S3. You should check out the Amazon AWS site and register an account if you don’t already have one. Then, go to the Security Credentials page to grab yourself an "Access Key ID" and a "Secret Access Key". (Keep these values: we’ll add this to our settings soon.)

At this point, you should also log into the AWS S3 Console and create a bucket to store your static files. (Also keep this around for settings.) Heroku uses the US Standard (US East) region, so place your bucket there for performance and lowest cost — bandwidth within an AWS region is free of charge.

Install psycopg2, boto, and django-storages:

cd $VIRTUAL_ENV/repo
echo "psycopg2" >> requirements.txt
echo "boto==2.2.1" >> requirements.txt
echo "django-storages==1.1.4" >> requirements.txt
pip install -r requirements.txt

Open up src/$PYTHON_APP_NAME/settings.py and add 'storages' to your INSTALLED_APPS. Uncomment django.contrib.admin, too.

Then, add the following lines to the bottom of your settings file, filling in your own AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_STORAGE_BUCKET_NAME.

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
STATICFILES_STORAGE = DEFAULT_FILE_STORAGE
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''
AWS_STORAGE_BUCKET_NAME = ''
STATIC_URL = '//s3.amazonaws.com/%s/' % AWS_STORAGE_BUCKET_NAME
ADMIN_MEDIA_PREFIX = STATIC_URL + 'admin/'

Copy and paste the following lines into the end of your settings file to enable database configuration by reading the DATABASE_URL environment var.

import os
import sys
import urlparse

# Register database schemes in URLs.
urlparse.uses_netloc.append('postgres')
urlparse.uses_netloc.append('mysql')

try:
    if 'DATABASES' not in locals():
        DATABASES = {}

    if 'DATABASE_URL' in os.environ:
        url = urlparse.urlparse(os.environ['DATABASE_URL'])

        # Ensure default database exists.
        DATABASES['default'] = DATABASES.get('default', {})

        # Update with environment configuration.
        DATABASES['default'].update({
            'NAME': url.path[1:],
            'USER': url.username,
            'PASSWORD': url.password,
            'HOST': url.hostname,
            'PORT': url.port,
        })
        if url.scheme == 'postgres':
            DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql_psycopg2'

        if url.scheme == 'mysql':
            DATABASES['default']['ENGINE'] = 'django.db.backends.mysql'
except Exception:
    print 'Unexpected error:', sys.exc_info()

(These have been copied from the Heroku Django starter doc. In cases where your Django app is on the root level of the repo, this code would automatically be appended to your settings file, but hey, we’re going for explicit instructions here to try to cut through the magic.)

Open up src/$PYTHON_APP_NAME/urls.py and uncomment the lines for the admin.

Now commit and push.

git add .
git commit -m "enable admin and boto-backed storage"

At this point you'll probably want to deploy your static files

heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py collectstatic --noinput"

(Note: the PYTHONPATH=/src env var needs to be set manually since manage.py doesn’t seem to get it when using heroku run. The new default project layout in Django 1.4 would make this step obsolete; in the Django 1.4 case, our manage.py would live directly in src which would cleanly put that directory on the implied PYTHONPATH.)

And then syncdb to initialze your database and create a user account for yourself.

heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py syncdb --noinput"
heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py createsuperuser"

Now try to open up http://PROJECT_NAME.herokuapp.com/admin/. The page should load, complete with the normal styling (served from your S3 bucket). You should also be able to log in with the username and password you just created.

Other helpful bits

I’ve found that it’s easiest to put templates in-app when using this workflow. If you need to use the old-fashioned workflow of putting all of your templates under one directory, you can move them to src/$PYTHON_APP_NAME/templates/ and then add 'your_python_app_name' to your INSTALLED_APPS.


You can set up memcached similar to how you hooked up PostgreSQL. First, add it to your account:

heroku addons:add memcache:5mb

Then add pylibmc and django-pylibmc-sasl to your requirements.

cd $VIRTUAL_ENV/repo
echo "pylibmc==1.2.2" >> requirements.txt
echo "django-pylibmc-sasl==0.2.4" >> requirements.txt
pip install -r requirements.txt

The django-pylibmc-sasl package is required to automatically configure memcached on Heroku (including the server, username, and password). All you have to do is point your settings file to it’s cache class:

CACHES = {
    'default': {
        'BACKEND': 'django_pylibmc.memcached.PyLibMCCache'
    }
}

Uploaded media (i.e. things in a FileField or ImageField) will get thrown into your S3 bucket automatically. As per the Django file docs, using object.some_file_field.url will return the URL of the file, as stored in S3, so you can use that property in templates without having to worry about MEDIA_URL. (i.e. <img src="{{ obj.image_field.url }}"/> — see this demo page and the source of that view for a full example.)


You can get basic "piggyback" SSL support (where your app runs at https://*.herokuapp.com/) by adding that addon:

heroku addons:add ssl:piggyback

This simply sets up the HTTPS path but doesn’t enforce it: to require SSL you’ll need to use some sort of Django middleware to redirect non-SSL requests. (I’ve whipped up this one that can be set as the first middleware in MIDDLEWARE_CLASSES. It also sets the Strict-Transport-Security header which tells complaint browsers to ONLY access the domain via SSL.)

Postscript

I’ve only been toying with the Heroku (Cedar) stack for about two weeks now, and it’s been pretty interesting so far. It’s very cool to be able to provision, deploy, and scale a cloud-based website within a shell, without dealing with the underlying Linux systems much (if at all). (With built-in robustness, too: Heroku attempts auto restarts of crashed processes once every ten minutes.) I’m not aware of any major Python/Django-running sites that deploy to Heroku in production, but the Cedar stack and the Python support along with it is fairly new. (Heroku does seem to be pretty popular for a fair bit of mostly Ruby-based sites.)

While the costs seem high at first glance compared to a purely shared host or AWS by itself (about $36/mo per dyno after the first one, databases and addons on their own steep scale), the cost is theoretically balanced out by lessening the need of a "true" sysadmin staff since the infrastructure — from hardware to OS, all the way up to the Python application — is entirely outsourced.

This does have it’s own ups and downs (that I won’t get into since I’m still relatively new to the platform), but in terms of raw cost, a fully decked 48-dyno (24 web, 24 worker) operation with a Ronin-class dedicated database would run you about $22,692 a year (which probably compares pretty favorably to a combination of hosting and IT staff costs in a more standard environment).

On the other hand, I haven’t yet load-tested Django/Heroku with a more legit, full-featured website, so the performance factor in the cost analysis is still a big question mark.