Compartmentalization

Double the fun! I’ve gone into the archives and published “Signal to Noise,” a previously unpublished entry from August of last year. You should probably read that first, as it goes hand-in-hand with this one.


My attempts at compartmentalization have failed. There is only one inbox.

On the down side (that was the up side), there is no “off the clock.” There is no “not on company time.” There is no “not speaking on behalf of…” Disclaimers to the contrary are commonplace, well-rehearsed, and futile. Technologies that “help” us to link our disparate personas will inevitably intertwine them with our impersonas too. There are no “strictly personal venues.” And when nothing can be said without being misconstrued, there is nothing left to be said.

My attempts at compartmentalization have failed. There is only one outbox.

Mark Pilgrim, One

It’s almost relieving to witness someone as well-known as Mark Pilgrim, running headfirst into this very issue.[1] This is the one demon that prevents me from posting draft after draft of blog prose. There is a crippling fear and question, “what if my personal thoughts and my professional persona are irreconcilable?”

I once made the mistake of mixing the much-too-personal with my blog, years ago — and, judging from the volume of entries I’ve published, I’ve practically been sitting on my hands, since.

But why compartmentalize? Why build those walls to divvy up our lives?

There is a fine line that separates “transparency” from “way too personal,” and it’s a line I regret crossing before. But I think I’d rather be judged as Mike Tigas — mistakes, missteps, misadventures, and all — than project a “manufactured” identity under my own name.

As a self-employed freelancer — whose brand is his name — I’m not sure I see the utility of having separate “professional” and “personal” lives. And even in general: work and home are very different places, but throughout the day isn’t it still the same life you’re living? (In some professions there will be exceptions to this, I’m sure.)

Online, sacrificing your identity for the sake of image is folly — your pseudo-identity just becomes a pretense, like you’re just a marketing gimmick for the product or brand you represent. And if that brand is you — is the dog walking the master at that point? (At what point does your brand stop representing you, but rather you represent what you wish it could stand for?)

I’m not saying you should talk “inside baseball” in the open. I’ve been under NDAs and I’ve in situations without ’em where openly discussing my work could be disastrous. But I suppose my point is censorship of personality: who you are in either environment shouldn’t differ all that much. You’re you. Everyone makes mistakes. If someone really wants to find something incriminating on you, they probably will, despite your best efforts. If you aren’t comfortable being yourself, then who are you?


…I’m working on that answer. I’ve been working on it for as long as I can remember, actually — winging it, floating between hobbies and work that I enjoy, looking for “a fit.” I graduate in six weeks. That will only be the beginning, I’m sure.


[1] In fact, a couple bloggers I follow and idolize share that common theme. (I don’t really know what that says about me.) Pilgrim lost his job over a post regarding alcoholism and addiction. Heather Armstrong’s work rants also got her fired.

Even now — nearly ten years since both lost jobs over blogging — the way they write is still intriguing and very human to me. Doesn’t hurt that they both ooze wit and charm through their writing. Compared to other blogs I follow out of topical interest, I follow them (and some others) just for the prose and writing style.

…And it probably helps their case that Pilgrim now works for Google and that Armstrong is possibly the most widely read female blogger today. Minor details.

On reality and authenticity

Mark Lamster, returning from a trip to Las Vegas:

Drinks at Prime Meats, in Brooklyn, with my wife. Realistically, this place is as much an artifice as anything on the Strip, a re-imagining of a 19th-century saloon, complete with polished bar, antique typography, Edison bulbs. Why, then, does it feel so much more honest? Because its aesthetic is filtered through a contemporary sensibility? Because it seems a natural part of a vibrant neighborhood? Is this all bullshit I invent to make myself feel more comfortable?

Mark Lamster, What Am I Doing Here? Tall Buildings and High Anxiety in Las Vegas


Carnegie Mellon Professor, Jesse Schell, on the psychology of games: Video here. It’s good in it’s entirety, but the relevant parts start at about 10:25. Segment quoted below starts at about 12:15:

Go look at TV — the people on TV, their heads are spinning! Everything is about reality TV. Go to the grocery store: it’s not just groceries anymore! Organic groceries — they’re more genuine, they’re more real groceries. You go to McDonald’s, and a Big Mac — well, you could get a Big Mac, or you could get the real burger, the Angus Burger, made with real this and that and whatever. Everything’s suddenly about reality.

[…] Gilmore and Pine put forth this interesting concept: that the most valuable thing in products today is are they real, are they authentic. Which is a bold hypothesis. And then they go further and they say, “Well, now why is it? Why now? It didn’t always used to be this way. Certainly it’s not what sold stuff in the ’80s. Right? […] What is it now that people are demanding reality, demanding authenticity?”

And they’re arguing that all this virtual stuff that’s been creeping up on us over the last twenty years has really cut us off from nature. We’re cut off from nature, we’re cut off from self-sufficiency.

[…] We live in a bubble of fake bullshit and we have this hunger to get to anything that’s real. Even if the best we can do is a Starbucks mocha with real Swiss chocolate — we’ll take it! Oh, that’s real! Look how real that seems to me, relative to what I’m used to!

Jesse Schnell, Design Outside the Box Presentation

In that segment, Schnell frequently references Authenticity, by Gilmore and Pine, so you might also want to check that out.


This is something I often wonder about, as the Internet grows by leaps and bounds. For example, my recurring love-hate relationship with the Great Internet Timesuck and my tendency to quit Facebook and invoke Vonnegut just about every year. As I said before, I feel as if there’s some sort of cultural push back on the horizon — maybe this “thirst for reality” is already here, just in some other form?

NationBrowse

NOTE: If you tried go to nationbrowse.com and ended up here, that site is now defunct. The years-old code was partially broken and the GIS bits were quite the strain the tiny little VPS server it ran on. Spiritual successors that I’ve had the pleasure of working on include the Spokesman-Review Census Center and census.ire.org (which is open source). Definitely check those out for your Census data-browsing needs.

Original post (February 22, 2010) follows:


I haven’t quite graduated yet, but I did take my “capstone” class last semester. The objective was vaguely, “do something innovative,” so I pitched (what I thought was) the data app of my dreams.

This is how it all went down. This is essentially a brain dump of all the little notes I’ve collected while working on this project. Boy, do I collect a lot of notes.

The end result

Quick note: The server running the demo is ill-equipped for the massive dataset size — I’ll talk more about this below. …If you click around and you get a timeout error, wait a minute to let the server catch up (or cache up…) and try again.

NationBrowse screenshot

In it’s current state, nationbrowse.com is a mess, but showing it off is the easiest starting point to work from:

Warning: A lot of technical talk, from here on out.

Background bits

Heavily inspired by: The Apps for America contests [1,2], ThisWeKnow, DataMasher, this Mapping L.A. Neighborhoods project from the Los Angeles Times, and EveryBlock. (ThisWeKnow and DataMasher, we actually hadn’t heard of until partway through the semester — was really great to see more reference projects show up along the way.)

The team: Graham Greenfield, Jeremy Howard, Nick Roma, and myself. While all had programming experience, none of the others had used Python, developed GIS software, or worked on a Web app with real-world data. (It went extremely well. They picked up quickly. Python is awesome.)

Source code: Here, on github.

The basics: Python, Django, and PostgreSQL. GeoDjango via PostGIS.

Server: Served over Apache+mod_wsgi, on an internal port. nginx sits at port 80 and proxies requests over to the Apache instance.

Caching: Memcached. Using python-memcached instead of (the now unmaintained) cmemcache. Using the cache middleware along with custom caching all over the place. (There are a few notes in the next section, regarding nginx+memcached.)

Mapping: OpenLayers, for client-side shape rendering.

Graphs: Google Chart API.

Data: U.S. Census TIGER/Line for shapefiles. U.S. Census 2000 & American Community Survey 2008 for most statistics. FBI Uniform Crime Reports for other numbers.

Issues & things we cut

A lot of our initial ambitions were fiercely struck down by performance considerations. Last I checked, a bzip2-compressed database dump sat at over one gigabyte due to the sheer number of states, counties, and ZIP codes stored and the precision of the shapefiles and statistics. On a VPS with 256MB of RAM, pitting PostgreSQL against a set of data at this size proved to be a royal pain in the ass.

Wanted to use TileCache/Mapnik, the “EveryBlock stack,” to generate maps server-side: performance was awful given the hardware/dataset circumstances. (Not to mention adding the configuration complexity of having a whole Apache mod_python instance running alongside the site’s Django wsgi instance.) Instead: we found a way render shapes in OpenLayers, on the user’s Web browser, by sending along raw WKT geo data in the Javascript for a given map. The (sometimes huge) file size increase far outweighed the (dangerously high) server load.

Wnated to use MatPlotLib, to generate server-side graphs: again, performance was killing the site. This was actually completely implemented [1, 2], but not strong enough for us to demo with. Instead: we built wrappers around the Google Chart API, which offloads the rendering work to some magical Google server.

nginx is being used as a reverse proxy and we’d hoped it could serve cached results, directly out of memcached. There are still some issues with corrupted/misencoded data being returned to the browser. (The classic “gibberish loads in browser” effect.) Not sure if this is due to the large size of things being stored, or what some encoding misconfiguration — if anyone has any ideas, I’d love to hear ’em. (I’m using this serve-from-cache method on this blog, and it’s working just fine, with a near-exact configuration.)

Similar to DataMasher, we wanted to develop a way to let users automatically create comparative (and inferential) statistics. Unlike DataMasher, we sought to build something statistically sound — we were talked out of this by some folks at the Social Science Statistics Center, who noted that blindly comparing Census data would create junk data in nearly every case. At this point, we just threw our all into descriptive statistics — hence a focus on maps, charts, and tables.

Pieces of note

The cacheutil library is a little “swiss army knife” that includes a few useful functions: the safe_get_cache/safe_set_cache/safecache methods and template tag, which sanitize and hash cache keys; some decorators for caching methods, class methods, and class properties; and a middleware for those wanting nginx to serve directly from the cache [1,2].

A threading shortcut function that allows you to call some function in the background, while the rest of your view moves on and gets returned to the user’s browser. (Useful for loading views or calling functions in advance, to pre-cache ’em before a user actually goes there.)

Some pluggable utilities for generating Google Graphs URLs.

A ton of Javascript magic, using jQuery and OpenLayers. Between the template and the static helper functions, you get that nice map with toggle-able shapes (to change which variable the map is shaded by) and the nice hover effect on the shapes — as seen on the homepage.

If you are interested in using MatPlotLib and Django, you can split your chart generation functions and the bits that actually grab the data & generate a PNG response. While this project couldn’t use it in the end, here’s a lot of potential for dynamic awesomeness there.

Credits due

Ted came up with the name a long time ago, when I first threw around the idea of a data project like this.

My team was awesome for going along with something so ridiculously ambitious. For a one-semester undergraduate capstone project, in which 75% of the team hadn’t even used the language, it really worked out. Graham and Jeremy were troopers and put a lot of work into the MatPlotLib renderers [1, 2] that weren’t fully implemented in the end product. Nick, without any prior Javascript or jQuery experience, built a GUI “query builder” (which, unfortunately, is not functional in the live demo).

After repeatedly shooting down Flash-based maps and discovering that server-side map tiles were out of the question, the dynamic elements of the map are heavily inspired from staring at the source of this Los Angeles Times mapping project. (And weeding my way through the OpenLayers documentation and mailing lists.) It’s not the prettiest, but there’s a lot of dynamic flexibility to it that I haven’t yet seen in other OpenLayers implementations.

Last complaints

Setting up a PostGIS database is a pain. Importing the entire State, County, and ZipCode sets is even worse. I did it here — note that I had to manually import Puerto Rican municipio (equivalent to counties) by tweaking the INSERT statements and unescaping some of the characters with diacritics and forcing PostgreSQL to run it as UTF-8. Hopefully that’ll save you some pain if you try this someday.

Census data is a mess. Know how to get to raw data from the homepage? Yeah. (Try the Download Center over here.) The data was pipe-delimited (and therefore, PostgreSQL could import it directly), but turning the many, many arbitrary columns into model fields was a pain.

Oh, and mixing data from disparate sources? (Say, the FBI Uniform Crime Reports, whose data is entirely distributed in Excel spreadsheets.) Good luck.

I would really love to see a more open method to access a lot of this data. After working on this project, I have to say that there are still significant barriers to doing useful things with open government data. ThisWeKnow uses RDF/SPARQL and is — judging from their goals and execution — an excellent start.

Epilogue

I don’t believe NationBrowse is “complete.” It’s a nice technology demo and was a nice experiment in building a large data app can be built with very few resources. But it’s a data ghetto. It’s a standalone site, with very little context and very little use of the massive underlying dataset.

If I could have another go at this, I’d have emphasized data export functionality or some other way to get “joined” data from disparate sets and sources. Possibly create an API around the underlying data. And even then, the data still needs to go in, somehow.

But hey, if four guys in college can find a way to make something of that data, for (near-)free, maybe there’s hope.

I implore you to dig around in the repository and especially check out the notable bits.

You can comment on this post via Google Buzz. Or, you can contact me directly.

This is a quote I love to come back to, time and again.

Even as a Web developer — a person who gets paid to go out and build up the great expanses of the Internet — I love this quote. And, to a great extent, I believe in it.

Electronic communities build nothing. You wind up with nothing. We are dancing animals. How beautiful it is to get up and go out and do something. We are here on Earth to fart around. Don’t let anybody tell you any different.

— Kurt Vonnegut, in A Man Without a Country


Google Buzz was released earlier this week. Facebook redesigned it’s main page. A lot of people paid a whole lot of attention to these things.

I had a good conversation with Carolina a couple nights ago, about the substitution of real social interaction for social networks. (Her friend Amanda expressed dismay at the whole thing, which is what got us on the subject.) And while I concede, there are plenty of uses for these communities — reconnecting with distant folks, planning events, having non-live conversations in comment streams — I can’t help but notice:

There are an increasing number of people I speak to that believe we’re placing far too much collective importance on these things. Me? I fear the people to young to remember dial-up Internet and earlier. And seriously, think about it: I’m sure there are some kids who communicate through these networking sites more than any other medium — text, phone, or in-person. This is all they’ll have ever known. (In practice, I’m sure the reality lies somewhere between texting and the Internet.)

In my wildest dreams, I imagine we’ll get to a point where this dawns on everyone and we have a large cultural push back. Maybe, like the whole/organic food fad, it’ll only be a minority. But sometimes I feel like the undercurrents are there.

Does anybody even remember Google Wave? Friendster? Xanga?

The iPad & Game Consoles

A quick thought or two on the iPad hubbub and the “casual computing vs. tinkering” conversation that’s been happening as of late. But first:

ThinkPad, anyone?

I concede “iPad” is a terrible name simply because of the similarity to Apple’s existing “iPod,” but I really don’t understand the fascination with “pad” jokes. A “-Pad” name has been pretty successful — without the toilet humor — for about 18 years now.

There are examples of names like this in recent history — take the Nintendo Wii, for example. Like the Wii, I’m pretty sure we’ll move on from picking on nomenclature once we start using the damn thing.

Which sort of leads into my main point

One of the general arguments against the iPad being successful is that it’s more expensive than a netbook, it’s not as full featured, and it doesn’t even multitask, etc.…

Who cares? Between my brother and I, we own several high-end computers that, by default, are closed systems. They don’t multitask. You can’t easily make your own content for them. You can’t really mess around with a lot of the performance-oriented settings.

They are: a Playstation 3, an Xbox 360, a Wii, and a few other systems.

For the most part, direct comparisons between these devices and “general computers” tend to be “apples to oranges” comparisons. (The classic “console vs. PC gaming” argument is probably the best example.)

They’re purpose-built machines, they’re in a different league and that’s that.

There are lots folks who own Macs or older PCs that want a way to play the latest games — and many of them own game consoles because that provides the easiest out-of-the-box experience as opposed to buying and maintaining a PC gaming rig. And it’s much easier than trying to play Crysis on a PC whose hardware is dated four or five years or one at a sub-$500 price tag.

My point is, there is a place for the iPad and people will buy it even if it is (several orders of magnitude) less versatile and far more expensive than a netbook. It doesn’t have to be a netbook to succeed. As long as the iPad gives the user enough of what they want (presumably: Web content, books, and apps) and wraps that up in an enjoyable experience, then Apple has a legitimate competitor against the netbook market.

Another point of reference: Some folks will go out and buy an Xbox 360 because of the platform-exclusive titles, like Halo. I could try to talk about how technologically superior the PlayStation 3 is to the Xbox 360, but I can’t specifically dissuade someone who loves Halo. Some folks will go the iPhone/iPad route specifically for the exclusive apps and features, too.

On hackers and tinkerers

On the other hand, there is the “tinkering argument” — that the spread and adoption of these “closed systems” will bring an end to the days of tinkerers.

Video game consoles also provide great analogue to the iPad’s “closedness” in this regard: they come “closed,” of course. But my Xbox 360 is modified to play burned games and doing the same to the Wii is, supposedly, a piece of cake. You don’t have to look far to find people willing to do the same with Apple’s closed systems.

(My brother and I do live on the far end of the tinkering range — in both PCs and game consoles — so my experiences are obviously a tiny bit skewed.)

Interestingly enough, I do notice that a great percentage of the PC gamers I know do tinker with the settings, update their drivers, upgrade their parts, etc., on a normal basis — or at the very least, know how to perform those tasks. And while I know of primarily-console folks who’ve modified or hacked their systems, they are a much rarer breed. This is exactly what the fear is: tinkering falling to the wayside because the closed-off systems inherently have fewer things to tinker with.[1]

While I have no reservations on the “closed” nature of the iPad specifically, I am one of the people that will be concerned if this truly is the “future of computing.”

At best, some console hacks are merely inconvenient[2], while at worst there are those that are outright illegal. I strongly believe that those who want to do more with their computing devices will inevitably find a way to do it. I just think it will play out better for everyone if we encourage and facilitate rather than criminalize curiosity and innovation.


[1] Alex Payne & Jim Stogdill both have excellent points on this, which inspired me to write a bit about it.
[2] Older PS3 models do allow you to install Linux on an unmodified console. And as far as I know, there are no hacks for the PS3 that allow you to play burned games.

I don’t have comments set up on this site yet, but if you’d like to, you can comment on this blog post over on Facebook. You don’t even have to be my friend.