Initial thoughts on Libraries and the Semantic Web

Ok, this isn’t quite my initial thoughts. I’ve been thinking about this since January 2008 when I was on the BOBCATSSS panel on Web 2.0/3.0 and Libaries. These are my thoughts after seeing Wolfram Alpha and reading about Google Squared.

First, here’s a great explanation of the semantic web. Basically, it means that a search will present data rather than documents. Here’s an example. Right now, if you want to know something about Mary Cassatt and do a Google search, you’re going to end up with your first link to the Wikipedia page for her. If you do a search in Wolfram Alpha, you get this page of data (and, of course, a link to Wikipedia). You, a person, have to read the Wikipedia entry and extract the data. In the latter case, the data is already extracted and presented in a tabular format.

(Incidentally, if you do that search in Credo Reference, you get a link to the biography of Mary Cassatt in France and the Americas: Culture, Politics, and History.)

The Wolfram Alpha results might not look like much, but that’s because biographical or artistic information requires some intellectual work beyond merely presenting data. Occupation, birth date, and death date are basic data that can be computed by, well, a computer. Try a math, science, or statistical question on it, and you’ll see where it really shines.

For right now, then, the semantic web is doing what computers do better than people–extracting and tabulating data. In a search engine that’s a big deal already. But there’s still a long way to go before the Intelligent Web, which many believe is the next step. In the Intelligent Web, the search engine will not only be able to extract data, but also to apply critical analysis and subjectivity to the data. That is a long long way off, but the semantic web can already help people with processing massive amounts of data. For instance, there is more scientific data published than any one person can read and understand. Even within specialized fields, it’s hard to stay on top of new developments. So pulling out the relevant data and presenting it can make it easier to spot patterns between studies. I’m sure this would have the same problems as current published meta-analyses, which can be problematic when they don’t reflect differences in reseach quality and methods.

Librarians already pull data out of documents and present it or interpret it for people. They also have the advantage, in many cases, of critical thinking and intelligence. So they do need to be concerned about semantic web technologies taking away the need for their expertise– or rather, they need to use their expertise to inform the creation of semantic web technologies. Friends, this means you must finally learn XML (just as a start). Librarians also need to be concerned about the democracy of data, but also the integrity of data. That means the most popular website thanks to hot-shot SEO work shouldn’t be the provider of data if another, less popular site has better data. This is something they already do, but when it’s not immediately apparent what document the data is being pulled from, it’s more important that the back end be honest. Plus it will take a long time before all the books of the world are scanned and searchable as part of the semantic web, so we need to stay on top of that.

What else? Leave your thoughts below.

P.S. I’m currently not able to leave comments. Please email me if you are having a similar problem and I will try to figure out what’s going on.