Oh, convergence

Lately, the Convergency Room somehow got onto the list of blogs I check daily -- perhaps it's the relative frequency of posts that students are forced to write. Or perhaps I'm bracing myself for the pain that's yet to come when I eventually take those classes. Or I'm learning what I can in the event that I don't get in to that sequence.

Of note was this post, by Alex Tribou, regarding a local iPhone news application, currently in the research phase. And I thought to myself, "finally, something here that I might be interested in reading."

...We've coordinated with all of our newsrooms to come up with an idea for a local news application. There will likely still be some tweaking here and there, but we have a pretty solid idea about what we want to do. We continue to try to get into contact with iPhone users, which has been challenging so far. We were not able to get a list of names like we had hoped, so it looks like we're going to have to spread the word and see what happens. Misty is starting to post our survey so that we can get as much feedback as possible...

I wonder: why solely target iPhones as opposed to any wireless Internet device? I know plenty of people with Windows Mobile phones or Blackberries -- moreso than I know iPhone users. Or is everyone in the School of Journalism so blinded by the empire of Apple?

Bah, and I also wonder why I haven't gotten myself involved in the J-School for all of this Internet business they're trying to do.

At the end of Spring last year, I got a letter from the J-School that said I'd been kicked out of the entire university. I'm positive it was just an impersonal, automatically-generated letter based on the fact that my GPA was so low that it merited kicking me out without probation.

Last week, I got a letter from the J-School that said I made it on the Dean's Honor Roll for Fall 2007. And again, I'm pretty positive it was just an impersonal, automatically-generated letter based on my GPA. Do they even know what kinds of trouble I've been putting myself through for them?

I don't know what to make of it -- being kicked out one semester and then lauded the next. Like in high school, I'm not quite sure any of them really know who the hell I am.

---

It was a really long weekend and I've got some stories to tell, though I may or may not ever get around to it. But there are always photos, and those are supposed to say like a thousand words each anyway. (So basically I've given you a 52,000 essay there. But not really.)

2008 Rally In The 100 Acre Wood

“Anonymous” protests Church of Scientology

Update, March 15: Post and photos regarding the March 15 demonstration.

---

In other news, I've got a lot of pictures I need to upload and a few stories to tell. Let's get to that now.

In January, Anonymous, a large, decentralized online community, declared war on the Church of Scientology regarding actions the church has brought upon current and former members. The war became dubbed Project Chanology, a portmanteau of "chan" (i.e., "chan" sites, such as 4chan, which harbor most of the Anonymous community) and "Scientology".

Why should you care about this? You probably shouldn't, but most of you know that I enjoy watching chaos unfold in the first-person.

Protests were scheduled for February 10, the birthdate of Lisa McPherson, who died in 1995 after being kept in isolation by the Church of Scientology for 17 days, also apparently taken off of important medication due to Scientology's stance on psychiatry and medicine.

To be fair, I did not go to St. Louis this weekend for this. In fact, I'd completely forgotten about it. But on Saturday, I did have a conversation about how I hadn't been to The Loop in a while... which brought up the fact that yes, we did remember reading something about a protest on Sunday and yes, the regional Church of Scientology was in fact on Delmar, just on the edge of The Loop. Being a frequenter of the Anonymous community, I thought it might be fun. But the clincher really came from reading the official Project Chanology pages and noticing that the St. Louis-area estimated headcount was at over 100, I was suddenly interested.

I normally wouldn't put much stock in an online-born movement against Scientology, as the Church's detractors have always been many. (Though, February's Wired has an awesome feature on online and IRL-based griefing, which does validate the fact that actual personal losses can come from such online attacks.) However, the Project Chanology site was remarkably well-organized and by the time I checked on Saturday night, there were also reports of successes in Australia, where it was already midday on Sunday.

I woke up at 8am on Sunday and I went to watch.

I saw the crowd as I drove by and as I walked up to the area -- maybe 40 or so people. I heard a guy drive by and yell, "fucking hippies!" as he passed.

This was going to be good.

Image of protesters Image of protesters

Protesters were warned to wear masks or otherwise obscure their face by wearing sunglasses and scarves. You know, to prevent them from later being identified by the Church. (The scarves thing helped -- it was extremely gusty and the temperature sat below 30 throughout the morning.) Some people reportedly looked into the local regulations and found that masks were legal; however, the police standing watch across the street politely asked the protesters to take them off. So, most wore them on top of their head. (Yes, those are Guy Fawkes masks, invoking a bit of that V For Vendetta feeling.)

Protesters Protesters

Folks came from around the region and set up shop across the street from the Scientology building. The mood was quite upbeat and friendly. Conversation often drifted back and forth between the beliefs of Scientology and randomly invoked internet memes. A group sang parts of "Still Alive," from the game Portal. Later, others sang the classic Rickroll, "Never Gonna Give You Up."

"Scientology is under 9000!" said one person.

It was a morning of intellectual discussion regarding the beliefs and methods of Scientology (and the Church) mixed in with fun-loving internet subculture.

Protesters

Fliers were handed out and most cars honked as they passed. Many in the crowd pointed out that the protest wasn't against the religion or belief system, but rather the actions of the church.

It was fun, save for the paranoia that came with noticing that a video camera facing the crowd had been placed in the doorway of the Scientology building. It seems that organizers may have been right in warning the protesters that the Church could log the protest and identify individuals in the crowd. But this didn't really stop anybody.

One organizer gathered a group together and remarked that "this is only phase two of this 'war'."

He continued by saying that if people really want to get involved with dismantling the secrecy and unjust actions of the Church, they should write letters and make calls to legislators and government officials to truly investigate the Church and (because of the high dollar amounts paid per member) remove its tax-exempt status as a religious organization.

I left when I realized I was probably getting frostbite in my hands from holding my camera too long. (Left my gloves in CoMo, heh.) But I think it was worth the trouble.

---

There are 38 photos in the full photoset, if you want to see 'em. You know you want to.

themaneater.com Launch

Update: I'm getting a lot of traffic to this page, thanks to Simon Willison linking to me. (Which, in turn, promoted this post on the Django community RSS feed.)

If you were linked to this page and are interested in reading a bit more of the history of this project and a few technical notes about the new site, you should probably start here.

---

We launched Friday morning, with an e-mail to our MizzouIT DNS contact to switch our themaneater.com domain over to the new site.

Of course, we just couldn't have a flawless launch. The site was slow, the site would break (503 Service Unavailable), and it sucked.

So I rewrote some views. I rewrote some caches. I reconfigured the cache. I disabled the cache. I re-enabled the cache. I reconfigured the URLconf...

...After spending the better part of Friday afternoon trying to optimize the site to no avail, I learned that sometimes the best solution to the most complex problem is the simplest.

Jacob Kaplan-Moss' Django performance tips said "Turn KeepAlive off."

I don’t totally understand how KeepAlive works, but turning it off on our Django servers increased performance by something like 50%.

Launching the site before turning KeepAlive off was like getting on the highway and realizing that the handbrake was on. By 50%, he must have meant 99%.

The site is now live and public.

Lesson learned.

And a HUGE sigh of relief: I've been involved with this new site on and off for nearly a year and a half now. I can't even express how happy I am to finally be somewhere with it.

Maneater Open Beta (Lessons Part 4)

Over a year and a half since I became involved with the online side of the Maneater, I finally feel that this new site project has fought its way out of the jaws of vaporwaredom. Following one false start after another and over a year of high hopes and dashed dreams, I think we have finally accomplished something.

Without further adieu, here is the current URL at which you can test the new Maneater site:
http://www.themaneater.com/
New Maneater Site

Things of note

Also, the site is currently one issue late on content, because updating the currently live site is already a time-consuming process.

---
Things I'd change

The way I programmed the site is (in hindsight) pretty sloppy. I set up multiple applications that all rely heavily on one another, which is a pretty big sin against the modular philosophy of Django. I should have lumped in most of the staff, content, and publishing information into one overall Newspaper application.

The stylesheet is still under consideration because I haven't yet had the time to tinker with it.

Some of the metadata (like "published date") is extremely redundant. Though perhaps it's just me being picky, but I don't think you need to set the date of an article if you're already attaching said article to an issue (which has the date set).

Ah, and I've got a long list of things that I'll be adding to this in the coming days. I'll port over parts of my todo list and notes when I get around to it.

---

Notes on searching

The one complaint I've noticed the most about Django is the lack of good search functionality. At the time of this writing, even the Columbia Missourian seems to have lost their search functionality.

Django only has limited capabilities in contains() -- which uses a wildcard LIKE statement on what you enter in -- and search() -- which uses MySQL's fulltext indexing but isn't very customizable. Fulltext (for those who don't do database work) uses real search algorithms instead of just looking for the exact thing entered. It also provides a guess to accuracy, so you can sort by the most accurate items -- you don't get this when you're doing a plain ol' LIKE statemenet.

My personal gripe with search() is that it uses boolean searching which isn't necessarily what a user should be using by default on a search box. I feel that the boolean capabilities are an advanced thing that only the diehard would really be using -- you know, "putting things in quotes" to search an exact phrase ... putting + or - in front of words to find articles that have one word but not another. Not to mention that for plain searches (no quotes, no + or -), boolean searches seem to be less accurate since it uses a much different algorithm than the normal "natural language" fulltext search.

So I resorted to a custom solution based on this whitepaper, revolving around a custom manager to handle searching however I wanted.

Creating a view around searching was also a bigger challenge than I originally thought. In the one day of staffer beta testing, I already got a handful of complaints regarding being able to search for staffers or photos. And being able to sort by date instead of accuracy -- it's not convenient to get a lot of articles from the 1990's when you're looking for something you vaguely remember reading last week. And then the issue of paginating large sets of search results.

I think we have a pretty robust search engine, that even demanding journalists should be able to find what they need. But that's coming off of the uncustomizable, date-sorted search that we have on the old site. And the lack of search on the Missourian site. (If anyone working on that site is reading this, I'd love to talk sometime.)

Maneater/Django development lessons, Part 3

Through a series of posts, I’m counting down to a public test of the new Maneater web site by the end of the weekend. We’re hoping to launch Tuesday.

Last post was about my first impressions of Django as the project started off. This is part three.

This is mostly a technical writeup. For those that aren't programmers, the final post (Part 4) is a general "layman's terms" overview of new features on the site.

---

I'm really going to gloss over the templates and views more than I'd like. Honestly, writing the views was more of a Python learning experience than a Django learning experience.

Learning from the past

The trouble we had in prior launch attempts was mostly due to overloading the Apache/Python instance on the server and overloading the database -- some of the more complex views involved lots and lots of database queries that would pull all columns from a set. Indexing helped a little bit, but lacking an easier method in Django, a few of the views I created relied heavily on subqueries. The logic around some of these views would bring the server down to a crawl. We have over 27,000 articles in our database, and some views would iterate over every single article several times before printing to a template.

Another limitation of the previous project was that we couldn't update Django to use the latest code and features. The backwards-incompatible changes I mentioned before sort of set us back in this regard, since caching wasn't nearly as extensible or quick in the early version of Django that was used. (Luckily, the API is now pretty stable, so the chances for backwards-incompatible changes at this point are low.)

Basically: everything about the old Django project was slow, and this was partially due to the old version of Django we were using and it was partially due to the code itself being poorly optimized.

---

Caching, as a lazy performance improvement

In an environment such as a newspaper website -- where, after a certain point, your content doesn't continue to update -- most pages and data don't change for a very, very long time (if ever). New stuff just gets added and old stuff is assumed to be static and archived.

In Django, you can cache anything. Caching more or less means you're putting your data somewhere that's quicker to access than where you normally get it. For example, leaving some documents on your desk so you can look at them later, as opposed to filing them back in a drawer. With Django, you can cache data (obviously) and the objects you get out of your database. And since you're normally caching in memory, the result ends up being extremely fast.

This is extremely beneficial in those situations above, where queries are server intensive and slow. Say you have a page that does this:

def some_function(self):
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff that takes a lot of server power
#...
#in this example, i'm theoretically
#narrowing down articles & staffers
#...
data = {
'articles':articles,
'staffers':staffers
}
#...
#do stuff to data
#...
return data

Which is fine, but if you use some_function a lot, your server starts to slow down quite a bit.

Let's say that it's safe to assume that the data doesn't change all the time. You can cache the "data" object by doing the following:

def some_function(self):
cachename = 'some_function_asdfasdfasdf'
data = cache.get(cachename)
if data==None:
articles = Article.objects.all()
staffers = Staffer.objects.all()
#...
#do stuff ...
#...
data = {
'articles':articles,
'staffers':staffers
}
cache.set(cachename,data)
#...
#do stuff to data
#...
return data

If you haven't done this function before (or the last cache expired), then it does the stuff you'd normally do to get "data". If you have it in the cache, then it skips over the database-intensive work, which makes it fast, and makes your server happy.

The default cache time is 5 minutes, and is modifiable in the settings. If your data doesn't change for an unusually long time, you can also set a specific time for a cache:

cache.set(cachename,data,60*60*24)
Caches your data for 24 hours (cache time is set in seconds).

Caches work by unique cache name, so by making "cachename" dynamic you can even cache very specific pages if you deem it necessary. (For example, I cache staffer profile pages on a per-person basis, because it uses lists of every article, photo, graphic, page design, and podcast credited to a given person.)

As a programmer, I considered my stopgap object-caching a very lazy way to optimize, but due to the nature of some of the data models and due to the sheer volume of information we have to go through (27000+ articles, thousands of photos, hundreds of staffers), there really isn't much I can do at this point on a deadline. And sometimes, because of the limitations of a language or framework, there may not be an easier way to optimize anyway.

---

I'm not going to touch templating that much here (though I may come back to this subject later to talk about "abstracting" templates and creating custom templates for RSS feeds and Javascript). I will say that as long as you can create basic HTML/CSS and know that "hey, this variable goes inside this tag" (i.e. <h1>{{title}}</h1>), you can more or less create a template for Django. Since most of the logic comes at the view level, you really don't need to know Python as much as simply understand programming basics like variables and for loops. (And of course you'll need to know what variables are coming at you from the view level, and you'll need to know the data models your site uses.)

It's my understanding that Ruby on Rails (another framework) uses some form of Ruby as it's template language. I love Django for the very plain text-based approach.

---

Next: I'll link you to the new site, walk you through the new features, and solicit your comments. I'll also outline a few new things I'm hoping to accomplish on the site, after polishing out any post-launch issues. Stay tuned.