Sunday, February 17, 2008

Reuters Gives the Semantic Web a Boost

Reuters recently announced that they're opening up their Calais service to the world.

Why should you care? Because Open Calais takes unstructured text documents, analyzes them, and automatically attaches semantically rich metadata. Fast. For free. It's an open API—any Web applications developer with half a clue can start using it right now.

I'm already plotting ways to use this for one of my freelance gigs. I get paid to read great gobs of news every day (like, the entire output of a couple of wire services typs of great gobs), looking for news about business and political leaders that's noteworthy enough to justify updating their biographies in a certain biographical database. So, basically, I'm looking for stories about people getting hired/promoted/fired/elected/un-elected, retiring, or getting involved in legal cases. Which is, oh, maybe 5 or 10 percent of the stories on the wires; most of the stories on the financial side, for example, are about companies releasing their quarterly results, mergers and acquisitions, and other stuff that's about companies rather than people. So if Open Calais is good enough to reliably separate out the stories about people from the stories about companies, I should theoretically be able to set up an RSS feed that just contains stories about people and cut down on the number of headlines I have to skim by 90 percent. And I suspect I might even be able to use the Open Calais-generated metadata to write a script to run the names mentioned in the story against this biographical database automatically, so I won't have to check by hand to see if the people are included in the database or not. Yes, I'm definitely seeing a whole lot of ways that this makes my life easier.

And that's not even getting into the potential library applications of this. Automated or semi-automated subject analysis of electronic documents, without having to shell out thousands of dollars for the commercial indexing software that's generally used for that task in special libraries, anyone?

No comments: