Thursday, January 12, 2006

Too much information

Cosma on the horrible information overload of real biology:
One reason papers like this gladden my heart is my basic intellectual cowardice: the sheer endless proliferating detail of biology overwhelms me, especially when something drives home the fact that we keep finding utterly new stuff everywhere we look. Here we are, looking at our own guts, and coming up with stuff like this: "Three sequences from two subjects ... appear to represent a novel lineage, deeply branching from the Cyanobacteria phylum and chloroplast sequences." See? There are organisms whose closest relatives are the stuff that turns ponds and leaves green living inside us, and until now we had no idea. And when we eventually look inside them, they're going to turn out to be weirdly complicated and uniquely strange, exactly like everything else. And of course the damn things will have histories, again exactly like everything else. Biology just doesn't stop, and at some point the details and special cases make me wish my head would explode.
Read the whole thing, it's funny. But I feel the same way, there's just too damn much detail. If there is a spectrum of detail-oriented vs. abstractifying in cognitive style, I am way way to the right. I can't remember stuff, which is why I am a software guy, and why I am emotionally tied to elegant languages and systems like Lisp (the more powerful the abstraction, the less you have to rememeber) and also have a deep interest in user interfaces (because a good UI will remove cognitive burdens via affordances).

This is what bioinformatics is all about, figuring out some way to abstract from all the detail into some form of useful knowledge. Statistics pulls a lot of weight, but since my mathematical skills have atrophied my efforts are more in the line of making environments that let scienteists link various types of knowledge together, perform computations over them, and visualize the results.

My real interest is broader; it's in figuring out just what it means to know something in a condition where you have a limited brain and senses surrounded by effectively infinite amounts of data. This is not a problem unique to bioinformatics. Basically it's everybody's problem, the problem of the googlectual. Google is a great tool but causes as many new problems as it solves.

Bruce Sterling's most recent book is promoting a vision of a world full of SPIME, which roughly means objects with a trackable identity (think RFID) and that have their entire history monitored, recorded, and available. This is going to save our environmental asses, somehow. Let's say he's right, how are we going to make use of all this data? I mean, it would be nice to be able to do a Google-local-physob search for that screwdriver I misplaced the other day, or have a dashboard widget that is tracking my laundry status and raises a red flag when clean sock levels are dangerously low. But presumably there are more interesting forms of knowledge to be extracted from a complete record of the interacting histories of objects.

