Saturday, September 29, 2007

Trust and Folksonomies

In a job interview two weeks ago, I was asked a question about the potential for malicious tagging if libraries opened up their catalogs and allowed users to tag books as they pleased. At the time, I answered by saying that most tagging systems handled malicious tags by ignoring them: if only one person tags an item with a malicious tag in a system like LibraryThing that has thousands of taggers, then that tag will sink to the bottom and the more common (and therefore more useful) tags will rise to the top. But after chewing on the question for awhile, I think I want to take that answer back.

Oh, it's a perfectly accurate answer, don't get me wrong, but I want to contest the very premise of the question. Why don't we as librarians trust our users to act responsibly in the library catalog? Consider all of the trust that we already put in our patrons. We let them wander around largely unsupervised in buildings containing millions of dollars worth of books that people have spent hundreds of hours painstakingly shelving in the proper order. We let them check out hundreds of dollars worth of books and DVDs at a time and take those books out into the world, where we have no control whatsoever over what they do with those items. And, really, how often do our users do anything truly catastrophic when given this trust? Yes, there are accidents sometimes, and coffee gets spilled on books or dogs them chew up, but I'm talking about really large-scale, intentional attemps to be destructive or malicious. Have you ever had a group of users, say, decide to re-shelve a section of books by color? (If I was a bored freshman looking for some sort of mildly amusing prank to pull, that's the kind of thing I'd consider. Think how pretty a rainbow of books would be!) How often do bored students pull the keys off the keyboards on the library computers and put them back on in alphabetical order? (Personally, I don't understand the amusement value of that one, but it was popular with kids in my computer programming classes in high school.) Often enough that you would call it a problem and consider taking some sort of step to cut down on it? If not, then why do you think your users would be any more likely to vandalize the catalog than they are to vandalize your physical facilities?

Friday, September 28, 2007

A New Use for Wikis

I haven't decided yet if this is sheer brilliance or absolutely insane. On the one hand, bringing more transparency to the process of drafting laws, and increasing the ability of non-lobbyists to have an influence on that process, are unquestionably good things. Heck, forget for a minute about the ability of common people to suggest edits to a draft law—can you just imagine all of the voters being able to look at the "edit history" of a law and see what parts were inserted by which people when?

On the other hand, a big part of what makes Wikipedia work is that is has a dedicated core of editors who are interested in truth and balance over any partisan viewpoint, and those editors out-number and out-clout the partisans. This situation is much less likely to hold in a sphere that is partisan by definition, such as drafting laws. As Michael Mussa (a former economist with the International Monetary Fund at a level where even the economists have to be politicians) once commented, "In Washington, truth is just another special interest, and one that is not particularly well financed."

New and Notable Digital Collections

Three new initiatives to digitize information and put it online have been in the news lately.

1) The Boston Library Consortium, which includes most of the major New England colleges and universities (Brandeis, Brown, MIT, Tufts, and quite a few more) is working with the Open Content Alliance to digitize public domain materials in their libraries.

2) The papers of former Supreme Court Justice Harry Blackmun are being digitized and put online. (Hat tip: Volokh Conspiracy.)

3) Robert Heinlein's papers are being digitized and put online (although, unfortunately, you have to pay to get access to them). (Hat tip: Slashdot.)

Sunday, September 23, 2007

Google Strikes Again!

I can only assume that they're planning to launch a U.S. version of this sometime before the campaigning for 2008 really gears up.

Friday, September 21, 2007

Another Player in Semantic Search

A startup called Powerset debuted a new system of semantic natural-language Web searching on September 17. It's currently in a closed alpha, so I haven't been able to poke at it, but it sounds both ambitious and wonderful. Plus, it's build by PARC—the same folks who invented computer mice and GUIs. Here is the page for PARC's natural language processing research project, which is the technology used by Powerset.

Saturday, September 15, 2007

One more reason to be skeptical about information published in journals

I really need to be preparing for my job interview on Monday (everyone wish me good luck and pray that Northwest Airlines manages not to make a hash of my flights!), but this story is so fascinating that I had to blog it right away.

Dr. John Ioannidis, an epidemiologist, has compiled evidence indicating that the results of the majority of published studies in the sciences are incorrect. He has also analyzed some of the reasons for this, stemming from the incentive structures of publishing and academia. Ioannidis doesn't seem to use the term confirmation bias—one of my hobbyhorses areas of research interest—but I think he'd agree that confirmation bias probably plays a pretty big role here too.

Tuesday, September 4, 2007

More about e-book sales

Looks like I may have spoken too soon about ebooks not catching on in the U.S. According to the International Digital Publishing Forum, U.S. trade ebook sales broke $8 million in the second quarter of 2007, up from under $2 million as recently as the fourth quarter of 2002. And that "does not include library, educational or professional electronic sales." It's still peanuts compared to print publishing revenues, but it's a sharper upward trend than I thought.

This is why one should always check the numbers and not just believe whatever the conventional wisdom is on a subject.

What was that about e-books being doomed?

In Japan, e-books designed for cellphones outsold print books in the first six months of 2007. (Hat tip: LISNews.)

I wish this sort of thing would catch on in the U.S. In general I prefer reading on a screen to reading on paper—I appreciate having the ability to search the full text for words and to control the font, the type size, etc. And, since it's not particularly uncommon for me to spend 12 hours a day at the computer, I've invested in a nice setup—a big high-quality LCD screen, ergonomic keyboard and trackball, and a good desk chair—so I'm actually more comfortable sitting at the computer than I am sitting on the sofa or in bed or all of those other places that people say they prefer to read. But I very rarely read true e-books (of the type carried by NetLibrary or ebrary) because the interfaces on them are so awful. Actually, for the longest time I couldn't use ebrary books even if I wanted to, because their proprietary reader didn't work on Linux and I wasn't about to boot into the other side of my dual-boot setup to access their books. (I'm a messy-desktop person—I usually have several dozen Firefox tabs, a dozen or so Thunderbird windows, and half a dozen word processing documents open at once. Closing them all, booting up into Windows, and then going back to Linux and trying to remember what I had open and why and re-opening them all is a pain that I'm only willing to go through for a very small number of things.) I've tried NetLibrary, but I got frustrated at my session timing out and losing my place. I was trying to use a NetLibrary book to write a paper, so I wanted to be able to refer to the book, refer to other stuff, write for awhile, pace around for awhile, and then go back and refer to the book again. No dice—every time I went back I had timed out and I had to start from the beginning to find the book and my page again. Also, at this point the majority of the time when I'm reading a book I'm doing so with an eye towards using an excerpt from the book in one or another anthology that I'm editing for Greenhaven, which means I need to be able to print a copy of the chapter I want to use. Except (at least the last time I tried this) NetLibrary is understandably not so keen about people printing out entire chapters because of the potential for copyright infringement.

But my point is, these aren't problems with e-books per se; they're problems with the current e-book interfaces. Unfortunately, I don't know what it's going to take to convince the e-book vendors to either improve their proprietary readers or to serve books in plain old let-me-do-what-I-want-with-it HTML. *sigh* Maybe I should just move to Japan.

Sunday, September 2, 2007

More Holiday Weekend Reading

I've had the June 2007 issue of Webology open in a Firefox tab since, oh, June or so, because it had a couple of articles on folksonomies and ontologies that I intended to read and blog about. Today I finally got around to reading them . . . and I have nothing to say about them except, read them. There's nothing too surprising in either of them, but they're still worth a look.

Saturday, September 1, 2007

Speaking of Economists and Libraries....

This article on how patients would get more accurate diagnoses if doctors used computer algorithms rather than their own “clinical judgment” has been getting some attention on econ blogs.

What does this have to do with libraries? Read the debate that's been raging on the NCG4LIB listserv about the potential that a well-coded algorithm using Bayesian inference could possibly do authority control work as well as if not better than human catalogers, and ponder how much the catalogers who decry this possibility might have in common with the doctors discussed in the article.

Link Roundup

An interview with the man behind Google Scholar. (Hat tip:

How many books should you start?, by Tyler Cowen, who is on the economics faculty at George Mason University. More LIS folks should read econ blogs like Marginal Revolution. One of the big movements in economics right now is behavioral economics—drawing on both psychology and economics to understand why people do what they do with the resources they control, including their time. I think that if more LIS folks understood the incentives that people respond to, they'd have an easier time designing services that get used and selling their services to their funding authorities.