Notice: I have neither posted nor updated any content on this blog since the mid 2010's. (😱) Please check out the homepage and my bio, for more recent information.
Loud thinking from the mind of Mike Tigas.
Notice: I have neither posted nor updated any content on this blog since the mid 2010's. (😱) Please check out the homepage and my bio, for more recent information.
WARNING: This is an old post from 2012. The information here is likely incredibly out of date. Proceed at your own risk.
A few people (including the main @Heroku account on Twitter) have mentioned that I’ve thrown together a working Heroku buildpack for PyPy. (And hey, PyPy 1.8 just dropped last Friday so you should check that out.)
This Buildpack provides PyPy 1.8 on Heroku, via @mtigas:ow.ly/92PtJ
— heroku (@heroku) February 13, 2012
I noticed a couple other attempts that were broken (with an 'import site' failed
error) and did some sleuthing on my own deployed-but-broken PyPy/Heroku site: did you know you can run arbitrary commands by using heroku run
? I dug around (via heroku run "ls -la *"
) and tinkered with a buildpack of my own and discovered that the include
dir was being symlinked (not actually copied) and the lib-python
and lib_pypy
directories contained symlinks rather than actual .py files. (This is possibly an issue with virtualenv --relocatable
and PyPy.)
I fixed the buildpack by removing those directories and copying them manually after the virtualenv is set up.
(Aside: I’ve been toying with Heroku in my spare time for a mere couple weeks now, but the Cedar Buildpack system is a really awesome concept and really caught my eye: you can essentially use them to deploy your own arbitrary projects and languages to the Cedar stack.)
In any case, this buildpack is more or less a drop-in replacement for the default Heroku one. When creating a Heroku app, you can enable it like this:
heroku create --stack cedar --buildpack git://github.com/mtigas/heroku-buildpack-pypy.git
Or, modify an existing Python/Heroku app by using:
heroku config:add BUILDPACK_URL=git://github.com/mtigas/heroku-buildpack-pypy.git
…and then re-deploying your app (by pushing).
Enjoy your easy to set up, cloud-hosted PyPy!
WARNING: This is an old post from 2012. The information here regarding configuration & deploy is likely incredibly out of date. Proceed at your own risk.
Extremely long tutorial post ahead.
If you want to follow along, I’ve made an example project available here. Said example site is running here.
An extremely abridged tldr version of this (containing just the shell and code snippets, sans explanation) is located here: tldr.markdown.
I've been toying around with Heroku in my spare time over the past couple weeks because "NoOps" is the new hotness and the promise is cool enough: wouldn’t it be great if you could write and deploy a high-performance website without having to micromanage the infrastructure? (See also: erosion and whatnot.)
In any case, the pricing structure of Heroku (750 free hours) is such that you can run a low-end, low-traffic website in Heroku for free, which is useful for trying it out.
The Heroku Django starter doc isn't bad, but leaves out some bits that I think are important in any production environment:
gunicorn
worker class: gunicorn recommends
you either run behind a buffering proxy server (i.e. nginx) or run one of the "async" worker classes. The Cedar 'herokuapp.com' HTTP stack directly connects requests
to the backend for flexibility (think WebSockets and the like), but doesn’t provide gzip or buffering of requests. (In fact, the Heroku Django docs mention using gunicorn+gevent, but don’t actually configure gevent in the relevant examples.)…So I’ve decided to tinker with Heroku and write a step-by-step tutorial as I go.
The following assumes you’re fairly proficient with Django and these steps are only useful for getting a barebones proof of concept site up and running. I do provide the database and caching bits for you, so you can use this as a stepping stone for trying out more full-featured projects on Heroku.
(Note: The free Heroku database is a shared server with only 5MB of raw data storage. The free memcached instance is likewise a tiny 5MB instance. These are toy websites we’re deploying here for free. The 20GB shared database is $15/month; you can also host your own dedicated EC2 postgres instance if you’d rather not go all-out with a Heroku dedicated DB.)
I’m going to assume you have a working copy of git, Python 2.7.X, pip, and virtualenv on your local system.
If you don't, you should install them via homebrew. If you don't
have homebrew, visit the documentation and run the one-line
install. (If you do have brew installed, now would be a great time to update
it. Run brew update
.)
Install git.
brew install git
Now install Python, add it to your PATH, and add that new PATH to
.bash_profile
so that this works in the future. (If you use ZSH or another
shell, do this to your .zshrc
or similar file.)
brew install pythonexport PATH=/usr/local/share/python:$PATHecho "export PATH=/usr/local/share/python:\$PATH" >> ~/.bash_profile
Then install pip and virtualenv.
easy_install pip pip install virtualenv
Register a Heroku account first.
Now install the heroku
and foreman
commands:
sudo gem install foreman heroku sudo update_rubygems
(Note: The Heroku docs tell you to use their toolbelt package to install these packages, but I’ve encountered errors with foreman
unless
I’ve sudo gem install'd
it. The gems get you the same thing, anyway.)
Once installed, run the heroku login
command, which allows you to run commands against your Heroku
account. (The Heroku Toolbelt page has an example of the login bit under "Getting Started".)
If you’re keeping score at home, here are the things you need to move on:
We’ll start by choosing an app name. (Change these values, please.)
# The "app name" that this will get in the Heroku control panel. Also# determines directory names and your "PROJECT_NAME.herokuapp.com"# default domain.export PROJECT_NAME="my-test-app"# The python module name for your Django site. Separate from above since# python app names should use underscores rather than dashes.export PYTHON_APP_NAME="my_test_app"
I like to put my projects in a ~/Code
directory, but you can change this to
place your projects whever you normally would:
# Set up a heroku-$PROJECT_NAME virtualenv in the ~/Code directory.cd ~/Codevirtualenv --no-site-packages heroku-$PROJECT_NAME
I’m going to gloss over the fine details on how to use virtualenv,
but you should be able to follow along here if you’ve ever done customization
to your .bash_profile
, .zshrc
, or similar shell init file.
# Modify the `activate` file with some sanity-ensuring defaults, like# ignoring any system-level PYTHONPATH and DJANGO_SETTINGS_MODULE.cd heroku-$PROJECT_NAMEecho "export PROJECT_NAME=\"$PROJECT_NAME\"" >> bin/activateecho "export PYTHON_APP_NAME=\"$PYTHON_APP_NAME\"" >> bin/activateecho "export PIP_RESPECT_VIRTUALENV=true" >> bin/activateecho "export PYTHONPATH=\"\$VIRTUAL_ENV/repo/src\"" >> bin/activateecho "unset DJANGO_SETTINGS_MODULE" >> bin/activate# Activate the environment.source bin/activate
Now we’re in an isolated virtualenv environment (since we started it with
--no-site-packages
), and we can pip install
to our heart's content and those
packages will be installed within this isolated sandbox (since we set
PIP_RESPECT_VIRTUALENV
).
(If you’ve never used virtualenv before: If you want to open this virtualenv
later, just run cd ~/Code/(PROJECT NAME)/
and then source bin/activate
.)
Now we’ll start up a repository to store our project and work our way through installing Django and our gunicorn server bits:
# Initialize a git repository in the `repo` subdirectory of this virtualenvgit init repocd repo# Start this git repo with my Python .gitignore of choice.# See it at https://gist.github.com/1806643/ for notes.curl -sLO https://raw.github.com/gist/1806643/.gitignoregit add .gitignoregit commit -m "initial commit, .gitignore"# Create a `src` directory within our repo.mkdir src# Install Django (1.3.X), gunicorn (0.13.X), gevent (0.13.X), and the greenlet# dependency.echo "django==1.3.1" > requirements.txtecho "gunicorn==0.13.4" >> requirements.txtecho "gevent==0.13.4" >> requirements.txtecho "greenlet==0.3.4" >> requirements.txtpip install -r requirements.txt
The src
directory will be where our Python sources live. It’ll be a place
on PYTHONPATH
, so root-level modules (and things that aren’t pip-installable)
can be placed there. (I prefer this to putting everything on the root level of the repository -- as is done in the Heroku docs -- for matters of keeping a well-organized source tree.)
We’ll set up a plain Django project inside:
# Enter the `src` dir and create a django projectcd $VIRTUAL_ENV/repo/src$VIRTUAL_ENV/bin/django-admin.py startproject $PYTHON_APP_NAMEcd $VIRTUAL_ENV/repo
Now, we’ll configure a procfile, which describes the processes that will power our services. (Well, just one now for our web service.)
# Unlike the gunicorn defined in Heroku's Django example, we're going# to use one of the async worker classes, "gevent". Using an async worker class# is recommended when serving traffic directly to gunicorn (which is what# happens under the Heroku Cedar stack).echo "web: gunicorn_django -b 0.0.0.0:\$PORT -w 9 -k gevent --max-requests 250 --preload src/$PYTHON_APP_NAME/settings.py" > Procfile
The web
service is special-cased to provide a $PORT
environment variable, which
is where Heroku will send your web traffic. I’ve set up some sane defaults
(9 workers, 250 requests per worker before restarting them) for Gunicorn that
you can configure for yourself later.
Now we’ll commit this bare Django project and test it locally.
# Commit everything we have in here.git add .git commit -m "base django site"# Test out our setup.foreman start
We’re using foreman
(man page), which reads the Procfile
and simulates running the service on Heroku.
Opening http://127.0.0.1:5000/ in the browser should display the standard "It Worked!" page. Now, let’s try to get this running in the cloud:
# Create a Heroku instance for this siteheroku create -s cedar $PROJECT_NAME# Make sure to add `src` to the PYTHONPATH on our server. (We added this to our# local activate file, but it needs to be applied to Heroku, too.)heroku config:add PYTHONPATH=/src# Deploy this project to Herokugit push heroku master
You should now be able to hit http://PROJECT_NAME.herokuapp.com/
and see that
the Django instance is running. Some things to try:
heroku ps
to see the status of the processes you have running.heroku logs
to see access or error logs. (heroku logs -t
acts like
the tail
command and sends you a constant stream of log lines.)Now we've got a website running at http://PROJECT_NAME.herokuapp.com/
that
has no database and cannot serve static assets. We’ll work on both, by
enabling the admin (since the Django 1.3+ admin site requires staticfiles and
Users set up in the database).
You can add a free, shared database account to your Heroku app by running this command:
heroku addons:add shared-database:5mb
If you run heroku config
you’ll see the DATABASE_URL
, which contains
your database’s username, password, hostname, and database name. (We’ll
be using this environment var to configure our database in Django shortly.)
For static storage, I’m going to use boto
and django-storages
to store
files in Amazon S3. You should
check out the Amazon AWS site and register an account if you don’t
already have one. Then, go to the Security Credentials page to
grab yourself an "Access Key ID" and a "Secret Access Key". (Keep these values:
we’ll add this to our settings soon.)
At this point, you should also log into the AWS S3 Console and create a bucket to store your static files. (Also keep this around for settings.) Heroku uses the US Standard (US East) region, so place your bucket there for performance and lowest cost — bandwidth within an AWS region is free of charge.
Install psycopg2, boto, and django-storages:
cd $VIRTUAL_ENV/repoecho "psycopg2" >> requirements.txtecho "boto==2.2.1" >> requirements.txtecho "django-storages==1.1.4" >> requirements.txtpip install -r requirements.txt
Open up src/$PYTHON_APP_NAME/settings.py
and add 'storages'
to your
INSTALLED_APPS
. Uncomment django.contrib.admin
, too.
Then, add the following lines to the bottom of your settings file, filling in
your own AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_STORAGE_BUCKET_NAME
.
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'STATICFILES_STORAGE = DEFAULT_FILE_STORAGEAWS_ACCESS_KEY_ID = ''AWS_SECRET_ACCESS_KEY = ''AWS_STORAGE_BUCKET_NAME = ''STATIC_URL = '//s3.amazonaws.com/%s/' % AWS_STORAGE_BUCKET_NAMEADMIN_MEDIA_PREFIX = STATIC_URL + 'admin/'
Copy and paste the following lines into the end of your settings file to enable
database configuration by reading the DATABASE_URL
environment var.
import osimport sysimport urlparse# Register database schemes in URLs.urlparse.uses_netloc.append('postgres')urlparse.uses_netloc.append('mysql')try:if 'DATABASES' not in locals():DATABASES = {}if 'DATABASE_URL' in os.environ:url = urlparse.urlparse(os.environ['DATABASE_URL'])# Ensure default database exists.DATABASES['default'] = DATABASES.get('default', {})# Update with environment configuration.DATABASES['default'].update({'NAME': url.path[1:],'USER': url.username,'PASSWORD': url.password,'HOST': url.hostname,'PORT': url.port,})if url.scheme == 'postgres':DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql_psycopg2'if url.scheme == 'mysql':DATABASES['default']['ENGINE'] = 'django.db.backends.mysql'except Exception:print 'Unexpected error:', sys.exc_info()
(These have been copied from the Heroku Django starter doc. In cases where your Django app is on the root level of the repo, this code would automatically be appended to your settings file, but hey, we’re going for explicit instructions here to try to cut through the magic.)
Open up src/$PYTHON_APP_NAME/urls.py
and uncomment the lines for the admin.
Now commit and push.
git add .git commit -m "enable admin and boto-backed storage"
At this point you'll probably want to deploy your static files
heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py collectstatic --noinput"
(Note: the PYTHONPATH=/src
env var needs to be set manually since manage.py doesn’t seem to get it when using heroku run
. The new default project layout in Django 1.4 would make this step obsolete; in the Django 1.4 case, our manage.py
would live directly in src
which would cleanly put that directory on the implied PYTHONPATH.)
And then syncdb to initialze your database and create a user account for yourself.
heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py syncdb --noinput"heroku run "PYTHONPATH=/src python src/$PYTHON_APP_NAME/manage.py createsuperuser"
Now try to open up http://PROJECT_NAME.herokuapp.com/admin/
. The page should load,
complete with the normal styling (served from your S3 bucket). You should also
be able to log in with the username and password you just created.
I’ve found that it’s easiest to put templates in-app when using this
workflow. If you need to use the old-fashioned workflow of putting all of your
templates under one directory, you can move them to src/$PYTHON_APP_NAME/templates/
and then add 'your_python_app_name'
to your INSTALLED_APPS
.
You can set up memcached similar to how you hooked up PostgreSQL. First, add it to your account:
heroku addons:add memcache:5mb
Then add pylibmc
and django-pylibmc-sasl
to your requirements.
cd $VIRTUAL_ENV/repoecho "pylibmc==1.2.2" >> requirements.txtecho "django-pylibmc-sasl==0.2.4" >> requirements.txtpip install -r requirements.txt
The django-pylibmc-sasl
package is required to automatically configure memcached on Heroku (including
the server, username, and password). All you have to do is point your settings file
to it’s cache class:
CACHES = {'default': {'BACKEND': 'django_pylibmc.memcached.PyLibMCCache'}}
Uploaded media (i.e. things in a FileField
or ImageField
) will get thrown
into your S3 bucket automatically. As per the Django file docs,
using object.some_file_field.url
will return the URL of the file, as stored
in S3, so you can use that property in templates without having to worry about MEDIA_URL
. (i.e. <img src="{{ obj.image_field.url }}"/>
— see this demo page and the source of that view for a full example.)
You can get basic "piggyback" SSL support (where your app runs at
https://*.herokuapp.com/
) by adding that addon:
heroku addons:add ssl:piggyback
This simply sets up the HTTPS path but doesn’t enforce it: to require SSL you’ll need to use some sort of Django middleware to redirect non-SSL requests. (I’ve whipped up this one that can
be set as the first middleware in MIDDLEWARE_CLASSES
. It also sets the
Strict-Transport-Security
header which tells complaint browsers to
ONLY access the domain via SSL.)
I’ve only been toying with the Heroku (Cedar) stack for about two weeks now, and it’s been pretty interesting so far. It’s very cool to be able to provision, deploy, and scale a cloud-based website within a shell, without dealing with the underlying Linux systems much (if at all). (With built-in robustness, too: Heroku attempts auto restarts of crashed processes once every ten minutes.) I’m not aware of any major Python/Django-running sites that deploy to Heroku in production, but the Cedar stack and the Python support along with it is fairly new. (Heroku does seem to be pretty popular for a fair bit of mostly Ruby-based sites.)
While the costs seem high at first glance compared to a purely shared host or AWS by itself (about $36/mo per dyno after the first one, databases and addons on their own steep scale), the cost is theoretically balanced out by lessening the need of a "true" sysadmin staff since the infrastructure — from hardware to OS, all the way up to the Python application — is entirely outsourced.
This does have it’s own ups and downs (that I won’t get into since I’m still relatively new to the platform), but in terms of raw cost, a fully decked 48-dyno (24 web, 24 worker) operation with a Ronin-class dedicated database would run you about $22,692 a year (which probably compares pretty favorably to a combination of hosting and IT staff costs in a more standard environment).
On the other hand, I haven’t yet load-tested Django/Heroku with a more legit, full-featured website, so the performance factor in the cost analysis is still a big question mark.