Sunday, July 29, 2007

A Crowd-Powered Search Engine

Jimmy Wales, founder of Wikipedia, has gone and done it—he's launching a wiki search engine, which, according to the story "will combine computer-driven algorithms and human-assisted editing.... Human editors would help untangle terms with multiple meanings, such as palm, which can refer to location like Palm Beach, or generic topics like trees or handheld computers."

You can bet I'll be watching this one closely.

Saturday, July 28, 2007

"Satisficing" Is Not a Dirty Word

Another one of my pet peeves is people who complain about students and other information-seekers “satisficing”—looking for information that is just good enough, rather than for the best information. A related pet peeve recently came up on PUBLIB (don't ask why I lurk on PUBLIB; there was a reason at the time I signed up for it, but now it's mostly for amusement value), when someone called easily searchable digital information systems a “prop” that hinders the development of thinking and reasoning skills.

While these complaints have a certain degree of merit to them, they ignore a couple of important economic principles. (Humor me here; my undergrad background is in the social sciences.) We all have a limited amount of time, money, energy, etc., to get through our days, and we have to make rational decisions about how to “economize” those things—how to use them most efficiently to achieve the most we can based on our constraints. This means we can't have it all—things that are time-consuming might not be expensive monetarily, but they're “expensive” in terms of another scarce quantity: time. Home-cooked “slow food” meals might not cost more than take-out, but an hour spent preparing a slow food meal is an hour that you can't spend, say, mowing the grass or sleeping or doing other things that you need to accomplish. Information is no different: an hour spent digging through a pile of poorly organized information trying to find the piece that is needed is an hour that a student can't spend writing the paper he needs to write, or doing homework for his other classes, or having a life outside of school. Yes, sometimes it's important for students to take the time and effort to really dig in and learn the structure of the literature in an area, to see who the big names are and what they're arguing, to learn the contours of the discourse . . . and sometimes they just need to find a piece of information quickly and get on with the rest of their lives. I suspect that this is doubly true of public library patrons, who generally don't feel the need to engage with a broad swathe of human knowledge the way students should. So give the people what they want already and don't make them feel guilty for having other things in their lives that are more important to them than conducting the best information search possible! Unless you live up to every other field's standards of perfection: if you eat only home-cooked healthy meals, exercise for the recommended 30 minutes per day, sleep for the recommended 8 hours per night, maintain your home in a state of Martha Stewart-like perfection, check the air pressure in your tires every time you gas up your car....

Thursday, July 26, 2007

More about Eyeballs and Errors

Sorry for the disappearance; I'm finishing up my last semester of actual classroom classes for the MLIS right now, so things have been a bit crazy of late. (I still have one more semester before the degree, but I'll just be interning in an academic library in the fall semester—no actual classes.) I hope to be back with a real, meaty post soon.

In the meantime, take a gander at this list of errors in the Encyclopedia Britannica that have been corrected in Wikipedia.

Monday, July 9, 2007

A Brain-Flash about Finding Libraries

I had one of my brain-flashes this morning, brought about by this Stephen Bell post at ACRLog and my initial response to it, which is posted in the comments over there. (Go read them. I'll wait. The rest of this post won't make much sense if you don't.)

The brain-flash was, this sounds like a project for the hive-mind! The hardest part of creating a "Find Your Library" tool is gathering all of the information about all of the however many thousands of libraries there are in the U.S. (and Canada, if we want to be inclusive). If you have to pay people to gather all of that information, it gets expensive, but create a site where people can contribute information about the libraries that they work at/patronize/know about, and sooner or later you'll get all of your data for free.

And you can get really rich data and do fun stuff with it when you're letting the public contribute. Let people rate libraries and leave comments about them! Let people tag the libraries and allow tag-based searches! Let people create structured folksonomies to organize the libraries into hierarchical categories to allow for even more powerful searches! This would be a really fun test-bed for some of the ideas I've been kicking around about people-powered ontologies....

So, what do you say? Is anybody interested in helping me launch this site?

Sunday, July 8, 2007

Given Enough Eyeballs, All Errors Are Obvious

One of my biggest pet peeves in life is people who proclaim that Wikipedia can't be trusted because anyone can edit it, while at the same time placing their full trust in "professionally-created" encyclopedias—preferably ones printed on paper—put out by the major publishers.

This pet peeve has been on my mind more than usual of late because I've spent a whole lot of the past month up to my eyeballs in reference works about the history of Eastern Europe, and I've found errors in quite a few of them. Two errors stick out for me, because I didn't immediately recognize them as errors and they sent me off on wild-goose chases. Error #1: A book on the history of Poland put out by one of the major library-focused publishers had some of the vital dates for Poland's most famous medieval queen off by about 50 years. (I've returned the book already and I don't remember if it was her birth date or death date or the date she assumed the throne or what, but it was something important like that, and it was WAY off.) Error #2: An entry in one of the major, reputable online encyclopedias listed one of Czechoslovakia's prime minister as an ethnic Slovak when he was actually an ethnic Czech.

Had these entries been in Wikipedia, I probably would have taken the time to fix them. Had they been put out by a company I freelance for, I would have e-mailed one of my contacts there and had them have it fixed. (Well, the online one anyway; there's really no fixing a book that's already been published.) But since neither of those things were true, those errors are going to persist and mislead more people, some of whom will never know that they've been misinformed.

Some errors are unavoidable. People are human and they make mistakes. But the odds of someone recognizing and fixing a mistake go up dramatically as the number of people who have the opportunity to notice and fix the mistake increases. Having worked in reference publishing for 6 years now, I'd estimate that around 4 people really seriously evaluate most things before they're published in reference books. How many people read and edit the average Wikipedia article? I don't know off the top of my head, but I'm guessing that it's a lot more than 4.

That doesn't mean you should uncritically accept anything you see on Wikipedia. Of course there are plenty of errors in Wikipedia as well, both of the "innocent mistakes" variety and the "malicious vandalism" variety. And of course professionally-produced encyclopedias don't have to worry about malicious vandalism in the same way, and that's an important factor to consider when discussing the reliability of Wikipedia. All I'm saying is that the Linux folk who proclaim that "given enough eyeballs, all bugs are shallow" are on to something, and not just in software development.

If you're interested in a scholarly, data-heavy examination of fact-checking in Wikipedia, check out this article.