Category Archives: Uncategorized

The Digital Public Library of America: What Does a New Platform Mean for Academic Research?

Robert Darnton asked in the New York Review of Books blog nearly two years ago: “Can we create a National Digital Library?” 1 Anyone who recalls reference homework exercises checking bibliographic information for United States imprints versus British or French will certainly remember the United States does not have a national library in the sense of a library that collects all the works of that country and creates a national bibliography 2 Certain libraries, such as the Library of Congress, have certain prerogatives for collection and dissemination of standards 3, but there is no one library that creates a national bibliography. Such it was for print, and so it remains even more so for digital. So when Darnton asks that–as he goes on to illuminate further in his article–he is asking a much larger question about  libraries in the United States. European and Asian countries have created national digital libraries as part of or in addition to their national print libraries.  The question is: if others can do it, why can’t we? Furthermore, why can’t we join those libraries with our national digital library? The DPLA has  announced collaboration with Europeana, which has already had notable successes with digitizing content and making it and its metadata freely available. This indicates that we could potentially create a useful worldwide digital library, or at least a North American/European one.The dream of Paul Otlet’s universal bibliography seems once again to be just out of reach.

In this post, I want to examine what the Digital Public Library of America claims to do, and what approaches it is taking. It is still new enough and there are still enough unanswered questions to give any sort of final answer to whether this will actually be the national digital library. Nonetheless, there seems to be enough traction and, perhaps more importantly, funding that we should pay close attention to what is delivered in April 2013.

Can we reach a common vision about the nature of the DPLA?

The planning for the DPLA started in the  fall of 2010 when Harvard’s Berkman Center received a grant from the Sloan Foundation to begin planning the project in earnest. The initial idea was to digitize all the materials which it was legal to digitize, and create a platform that would be accessible to all people in the US (or nationally). Google had already proved that it was possible, so it seemed that with many libraries working together it would be concievable to repeat their sucesses, but with solely non-commerical motives  4.

The initials stages of planning brought out many different ideas and perspectives about the philosophical and practical components of the DPLA, many of which are still unanswered. The theme of debate that has emerged are whether the DPLA would be a true “public” library, and what in fact ought to be in such a library. David Rothman argues that the DPLA as described by Darnton would be a wonderful tool for making humanities research easy and viable for more people, but would not solve the problems of making popular e-books  accessible through libraries or getting students up-to-date textbooks. The latter two aims are much more challenging than getting access to public domain or academic materials because a lot more money is at stake 5.

One of the projects for the Audience and Content workstream is to figure out how average Americans might actually use a digital public library of America. One of the potential use cases is a student who can just use DPLA to write a whole paper on the Iriquois Nations. Teachers and librarians posted some questions about this in the comments, including questioning whether it is appropriate to tell students to use one portal for all research. We generally counsel students to check multiple sources–and getting students used to searching one place that happens to be appropriate for searching one topic may not work if the DPLA has nothing available on say, the latest computer technology.

Digital content and the DPLA

What content the DPLA will provide will surely become more clear over the following months. They have appointed Emily Gore as Director of Content, and continue to hold further working groups on content and audience. The DPLA website promises a remarkable vision for content:

The DPLA will incorporate all media types and formats including the written record—books, pamphlets, periodicals, manuscripts, and digital texts—and expanding into visual and audiovisual materials in concert with existing repositories. In order to lay a solid foundation for its collections, the DPLA will begin with works in the public domain that have already been digitized and are accessible through other initiatives. Further material will be added incrementally to this basic foundation, starting with orphan works and materials that are in copyright but out-of-print. The DPLA will also explore models for digital lending of in-copyright materials. The content that is contributed to or funded by the DPLA will be made available, including through bulk download, with no new restrictions, via a service available to libraries, museums, and archives in the United States, with use and reuse governed only by public law.  6

All of these models exist in one way or another already, however, so how is this something new?

The major purveyors of out of copyright digital book content are Google Books and HathiTrust. The potential problems with Google Books are obvious just in the name–Google is a publicly traded company with aspirations to be the hub of all world information. Privacy and availability, not to mention legality, are a few of the concerns. HathiTrust is a collective of research universities digitizing collections, many in concert with Google Books, but the full text of these books in a convenient format is generally only available to members of HathiTrust. HathiTrust faced a lawsuit from the Authors Guild about its digitization of orphan works, which is an issue the DPLA is also planning to address.

Other projects exist trying to make currently in copyright digital books more accessible, of which Unglue.it is probably best known. This requires a critical mass of people to actively work to pay to release a book into the public domain, and so may not serve the scholar with a unique research project. Some future plans for the DPLA include to obtain funds to pay authors for use–but this may or may not include releasing books into the public domain.

DPLA is not meant to include books alone. Planning so far suggests that books make a logical jumping off point. The “Concept Note” points out that “if it takes the sky as its limit, it will never get off the ground.” Despite this caution, ideally it would eventually be a portal to all types of materials already made available by cultural institutions, including datasets and government information.

Do we need another platform?

The first element of the DPLA is code–it will use open source technologies in developing a platform, and will release all code (and the tools and services this code builds) as open source software.  The so-called “Beta Sprint” that took place last year invited people to “grapple, technically and creatively, with what has already been accomplished and what still need to be developed…” 7. The winning “betas” deal largely with issues of interoperability and linked data. Certainly if a platform could be developed that solved these problems, this would be a huge boon to the library world.

Getting involved withe DPLA and looking to the future

While the governance structure is becoming more formal, there are plenty of opportunities to become involved with the DPLA. Six working groups (called workstreams) were formed to discuss content, audience, legal issues, business models, governance, and technical issues. Becoming involved with the DPLA is as easy as signing up for an  account on the wiki and noting your name and comments on the working group page in which are interested. You can also sign up mailing lists to stay involved in the project. Like many such projects, the work is done by the people who show up and speak up. If you read this and have an opinion on the direction the DPLA should take, it is not difficult to make sure your opinion gets heard by the right people.

Like all writing about the DPLA since the planning began, turning to a thought experiment seems the next logical rhetorical step. Let’s say that the DPLA succeeds to the point where all public domain books in the United States are digitized and available in multiple formats to any person in the country, and a significant number of in copyright works are also available. What does this mean for libraries as a whole? Does it make public libraries research libraries? How does it change the nature of research libraries? And lastly, will all this information create a new desire for knowledge among the American people?

References
  1. Darnton, Robert. “A Library Without Walls.” NYRblog, October 4, 2010. http://www.nybooks.com/blogs/nyrblog/2010/oct/04/library-without-walls/.
  2. McGowan, Ian. “National Libraries.” In Encyclopedia of Library and Information Sciences, Third Edition, 3850–3863.
  3. “Frequently Asked Questions – About the Library (Library of Congress).” Text, n.d. http://www.loc.gov/about/faqs.html#every_book
  4. Dillon, Cy. “Planning the Digital Public Library of America.” College & Undergraduate Libraries 19, no. 1 (March 2012): 101–107.
  5. Rothman, David H. “It’s Time for a National Digital-Library System.” The Chronicle of Higher Education, February 24, 2011, sec. The Chronicle Review. http://chronicle.com/article/Its-Time-for-a-National/126489/.
  6. “Elements of the DPLA.” Digital Public Library of America, n.d. http://dp.la/about/elements-of-the-dpla/.
  7. “Digital Public Library of America Steering Committee Announces ‘Beta Sprint’ ”, May 20, 2011. http://cyber.law.harvard.edu/newsroom/Digital_Public_Library_America_Beta_Sprint.

Real World Semantic Web?: Facebook’s Open Graph Protocol (ACRL Tech Connect Post)

Originally posted at ACRL Tech Connect on May 10, 2012.

Original image available at https://developers.facebook.com/docs/opengraph/

Librarians need to understand what the semantic web is and how to use it, but this can be challenging. While the promise of the semantic web has existed for over a decade, to the uninitiated there may not seem to be many implementations that are accessible to the average person.

One implementation that most people use daily is Facebook’s Open Graph Protocol, which is their version of the semantic web. This is a useful example to illustrate the ideas behind the semantic web and linked data. Libraries and other cultural institutions want and need to make their data open, and Facebook’s openness is highly questionable, so it will also illustrate some of the potential problems with linked data that isn’t open. There is much great work being done in the library world with the semantic web and linked data, which will be addressed in more detail in further posts.

The Semantic Web and Linked Data

The “semantic web” describes a web where data is understood by computers in some of the same ways humans understand it. Tim Berners-Lee illustrates this wonderfully in his 2001 Scientific American article with a future in which the diagnosis of a family member with cancer is made easier by the smart device which can find the most appropriate specialist in a convenient location at a convenient time, with very little work on the part of the searcher. This is only possible, however, when data is semantically meaningful. Open hours for a doctor (or a library) written on a website mean something to a human, but very little to a computer. Once those hours are structured in a way that can be made meaningful, the computer can tell you if the doctor’s office is open–and if it has access to your calendar, what you have to cancel to go there.

Linking data takes this implementation a step further and makes it possible to connect data, to avoid, as the W3C says “a sheer collection of datasets”. Berners-Lee outlines the steps that need to be followed to make linked data in a 2006 post, namely to use uniform resource indicators (URIs) as names, to present those URIs in the hypertext protocol, use a standard format such as RDF to present useful information, and link to additional URIs with related information. A 2010 follow-up points out that to be linked open data, the data must be presented with a license that allows free unimpeded use, such as the Creative Commons CC-BY license. Such data doesn’t have to be structured in any particular way as long as it’s open. He says that “…you get one (big!) star if the information has been made public at all, even if it is a photo of a scan of a fax of a table — if it has an open licence.” But “five-star” linked open data meets all of the above requirements as well.

Facebook’s Open Graph Protocol

Moving into a different world, let’s consider what the semantic web and linked data look like at Facebook. First, it is interesting to consider what Facebook was before it was semantic. When Facebook first started in 2005, you could make a list of things you “liked”. You might have said you “liked” the movie Clueless and “liked” running, but these were just lists that would let others in your college classes know a few facts about you next time you saw them in class or at a party. In theory you could use these lists to find others that shared your interests, but this required a person to understand what interests matched each other.

But starting in 2010 these “likes” took on a real semantic meaning. Suddenly “liking” the movie Clueless meant that, among other things, the owners of the “Clueless” identity on Facebook could directly send you marketing announcements. In addition, you could “like” content outside of Facebook completely as long as that website used the correct markup on the page to speak to Facebook, and thus link together content with people. Unlike Facebook’s earlier scheme of Beacon, it was easier to understand how you were exposing yourself to advertisers and to control privacy and sharing, though this still left people troubled.

In late 2011/early 2012 Facebook opened up this system even more to third party developers, which went along with the new Facebook Timeline. Now any person could perform any verb with any application. So “Margaret read a book on Goodreads” or “Margaret listened to a song on Spotify”–real world actions–turn into semantically meaningful statements on my Facebook Timeline. As long as the user authenticates the application, the application can access the necessary information to grab the information about the object from the webpage and show the user’s interaction with it.

Developing for the Open Graph

The Open Graph protocol was developed based on the idea of the “social graph”, which represents the connections between people and the types of relationships they have with each other. In the Facebook universe, this includes the relationships people have with other types of entities, such as media, products, and companies. It was developed by Facebook to make a quick and easy way for websites to include semantically meaningful data. It is based on the standard RDF specification for linked data and includes basic and optional metadata, as well as different types of structured data about objects, of which music and videos are the most well-defined.

To see the Open Graph in action, simply replace “www” with “graph” at the beginning of any Facebook page. For instance, let’s take a look at my own library’s information at http://graph.facebook.com/rebeccacrownlibrary. You can see that this page describes a library, and get our phone number, physical location, and open hours. Most important, a computer viewing this page can understand this information. For complete details, see the Graph API documentation–even for non-developers this is interesting; for instance, find out how to get the URL for your current profile picture to embed in other sites. To get access to this information, you can use various methods, including the Facebook Query Language.

Of course, you only get access to this information if it’s explicitly made public by the page. For anything beyond that, applications must use authentication in order to access more. Linking information from outside of Facebook is one way only–you can’t pull very much at all out of Facebook into the open web. Note that, for instance, Google searches will pull up only basic information from a Facebook page rather than any content that page has posted.

Outside of Facebook–How “Open” is the Open Graph?

It is precisely this closed effect that has a lot of people worried about Facebook’s implementation of the semantic web. Brad Fitzpatrick described the problems in 2007 inherent in implementations of the “social graph” on the web, which was that standards were quirky, non-interoperable, and usually completely walled off. The solution would be a Social Graph API that would create a social graph outside of any one company and belonging to all. This would allow people to find friends and connections without signing up for additional services or relying on Facebook or any other company. Fitzpatrick did later create a Social Graph API, which Google recently pulled out of their products. Some of the problems of an open social graph are familiar to librarians: people are hesitant to share too much information with just anyone about with whom they associate, what they like, and what they think (Prodromou). The great boon for advertisers in social networking services is that inside walled gardens with reasonable privacy controls is that people are willing to share much more information. Thus the walled garden of Facebook, inaccessible to Google, means that that valuable social data is inaccessible. It is perhaps not coincidental that around the same time Google stopped supporting the open Social Graph API that they released the API for their own social networking service Google Plus.

Concerns with the Open Graph remain that it is not actually open, and in particular that it uses the open standard of RDF to ingest but not share content (Turenhout). The Open Graph Protocol website states that a variety of big websites are publishing websites with Open Graph markup and it is ingested by Facebook (of course), Google, and mixi. It remains unclear how much this particular standard will be adopted outside of Facebook.

Conclusion

Whether or not you think you have any idea what linked data is, any time you click a “like” button on a website or sign up for a social sharing app in Facebook, you are participating in the semantic web. But every time that data link goes behind a Facebook wall, it fails in being open linked data. Just as librarians have always worked to keep the world’s knowledge available to all, we must continue to ensure that potentially important linked data is kept open as well–and with no commercial motive. The LODLAM Summit has outlined and continues to work on what linked open data looks like for libraries, archives, and museums. The W3C Library Linked Data Incubator Group released its final report in fall 2011, which provides a thorough overview of the roles and responsibilities of libraries in the world of linked open data. There is a lot of possibility around this area right now, and the future openness of the world wide web may very well depend on action taken right now.

In a future post, we will examine some specific examples of work being done in the library world around the semantic web and linked data.

Works Cited

Axon, Samuel. “Facebook’s Open Graph Personalizes the Web.” Mashable, April 21, 2010. http://mashable.com/2010/04/21/facebook-open-graph/.
Berners-Lee, Tim, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American 284, no. 5 (May 2001): 34. doi: 10.1038/scientificamerican0501-34
Berners-Lee, Tim. “Linked Data.” Design Issues, July 27, 2006. http://www.w3.org/DesignIssues/LinkedData.html.
Fitzpatrick, Brad, and David Recordon. “Thoughts on the Social Graph.” Bradfitz.com, August 17, 2007. http://bradfitz.com/social-graph-problem/.
Geron, Tomio. “Facebook Expands Open Graph To 60 New Apps, Many More Coming.” Forbes.com (January 18, 2011): 20.
Giles, Jim. “If Facebook Likes the Semantic Web, You’ll Love It.” New Scientist, July 31, 2010.
Iskold, Alex. “Social Graph: Concepts and Issues.” Read Write Web, September 11, 2007. http://www.readwriteweb.com/archives/social_graph_concepts_and_issues.php.
Mitchell, Jon. “Google Plus Releases APIs for Search, +1s and Comments.” Read Write Web, October 4, 2011. http://www.readwriteweb.com/archives/google_plus_releases_apis_for_search_1s_and_commen.php.
Prodromou, Evan. “On the Social Graph API.” Evan Prodromou: His Life and Times, February 21, 2012. http://evanprodromou.name/2012/02/21/on-the-social-graph-api/.
Turenhout, Ryanne. “Harry Halpin on the Hidden History of the ‘Like’ Button.” Institute of Network Cultures, March 10, 2012. http://networkcultures.org/wpmu/unlikeus/2012/03/10/harry-halpin-on-the-hidden-history-of-the-like-button/.

Personal Data Monitoring: Gamifying Yourself (New ACRL Tech Connect post)

This post originally appeared on ACRL Tech Connect on April 9, 2012

The academic world has been talking about gamification of learning for some time now. The 2012 Horizon Report says gamification of learning will become mainstream in 2-3 years. Gamification taps into the innate human love of narrative and displaying accomplishments. Anyone working through Code Year is personally familiar with the lure of the green bar that tells you how far you are to your next badge. In this post I want to address a related but slightly different topic: personal data capture and analytics.

Where does the library fit into this? One of the roles of the academic library is to help educate and facilitate the work of researchers. Effective research requires collecting a wide variety of relevant sources, reading them, and saving the relevant information for the future. The 2010 book Too Much to Know by Ann Blair describes the note taking and indexing habits taught to scholars in early modern Europe. Keeping a list of topics and sources was a major focus of scholars, and the resulting notes and indexes were published in their own right. Nowadays maintaining a list of sources is easier than ever with the many tools to collect and store references–but challenges remain due to the abundance of sources and pressure to publish, among others.

New Approaches and Tools in Personal Data Monitoring

Tracking one’s daily habits, reading lists and any other personal information is a very old human habit. Understanding what you are currently doing is the first step in creating better habits, and technology makes it easier to collect this data. Stephen Wolfram has been using technology to collect data about himself for nearly 25 years, and he posted some visual examples of this a few weeks ago. This includes items such as how many emails he’s sent and received, keystrokes made, and file types created. The Felton report, produced by Nick Felton, is a gorgeously designed book with personal data about himself and his family. But you don’t have to be a data or design whiz to collect and display personal information. For instance, to display your data in a visually compelling way you can use a service such as Daytum to create a personal data dashboard.

Hours of Activity recorded by Fitbit

In the realm of fitness and health, there are many products that will help capture, store, and analyze personal data. Devices like the Fitbit now clip or strap to your body and count steps taken, floors climbed, and hours slept. Pedometers and GPS enabled sport watches help those trying to get in shape, but the new field of personal genetic monitoring and behavior analytics promise to make it possible to know very specific information about your health and understand potential future choices to make. 23andMe will map your personal genome and provide a portal for analyzing and understanding your genetic profile, allowing unprecedented ability to understand health. (Though there is doubt about whether this can accurately predict disease). For the behavioral and lifestyle aspects of health a new service called Ginger.io will help collect daily data for health professionals.

Number of readers recorded by Mendeley

Visual cues of graphs of accomplishments and green progress bars can be as helpful in keeping up research and monitoring one’s personal research habits just as much as they help in learning to code or training for a marathon. One such feature is the personal reading challenge on Goodreads,which lets you set a goal of how many books to read in the year, tracks what you’ve read, and lets you know how far behind or ahead you are at your current reading pace. Each book listed as in progress has a progress bar indicating how far along in the book you are. This is a simple but effective visual cue. Another popular tool, Mendeley, provides a convenient way to store PDFs and track references of all kinds. Built into this is a small green icon that indicates a reference is unread. You can sort references by read/unread–by marking a reference as “read”, the article appears as read in the Mendeley research database. Academia.eduprovides another way for scholars to share research papers and see how many readers they have.

Libraries and Personal Data

How can libraries facilitate this type of personal data monitoring and make it easy for researchers to keep track of what they have done and help them set goals for the future? Last November the Academic Book Writing Month (#acbowrimo) Twitter hashtag community spun off of National Novel Writing Month and challenged participants to complete the first draft of an academic book or other lengthy work. Participants tracked daily word counts and research goals and encouraged each other to complete the work. Librarians could work with researchers at their institutions, both faculty and students, on this type of peer encouragement. We already do this type of activity, but tools like Twitter make it easier to share with a community who might not come to the library often.

The recent furor over the change in Google’s privacy settings prompted many people to delete their Google search histories. Considered another way, this is a treasure trove of past interests to mine for a researcher trying to remember a book he or she was searching for some years ago—information that may not be available anywhere else. Librarians have certain professional ethics that make collecting and analyzing that type of personal data extremely complex. While we collect all types of data and avidly analyze it, we are careful to not keep track of what individuals read, borrowed, or asked of a librarian. This keeps individual researchers’ privacy safe; the major disadvantage is that it puts the onus on the individual to collect his own data. For people who might read hundreds or thousands of books and articles it can be a challenge to track all those individual items. Library catalogs are not great at facilitating this type of recordkeeping. Some next generation catalogs provide better listing and sharing features, but the user has to know how to add each item. Even if we can’t provide users a historical list of all items they’ve ever borrowed, we can help to educate them on how to create such lists. And in fact, unless we do help researchers create lists like this we lose out on an important piece of the historical record, such as the library borrowing history in Dissenting Academies Online.

Conclusion

What are some types of data we can ethically and legally share to help our researchers track personal data? We could share statistics on the average numbers of books checked out by students and faculty, articles downloaded, articles ordered, and other numbers that will help people understand where they fall along a continuum of research. Of course all libraries already collect this information–it’s just a matter of sharing it in a way that makes it easy to use. People want to collect and analyze data about what they do to help them reach their goals. Now that this is so easy we must consider how we can help them.

 

Works Cited
Blair, Ann. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven: Yale University Press, 2010.