change digital libraries metadata

Report from the Digital Public Library of America Midwest

Two years after the initial meeting for the Digital Public Library of America, another major planning and update meeting took place in Chicago at DPLA Midwest. At this meeting the steering committee handed the project over to the inaugural board and everyone who has been working on the project talked about what had happened over the past few years and the ambitious timetable to launch in April 2013.

In August I wrote about the DPLA and had many unanswered questioned. Luckily I had the opportunity to attend the meeting and participate heavily in the backchanel (both virtual and physical). This post is a report of what happened at the general meeting (I was not able to attend the workstream meetings the day before). This is a followup to my last post about the Digital Public Library of America–then I felt like an observer, but the great thing about this project is how easy it is to become a participant.

Looking Back and Ahead

The day started with a welcome from John Palfrey, who reported that through the livestream and mailing lists there were over a thousand active participants in the process. The project seemed two years ago (and still does) seem to him “completely ambitious and almost crazy,” but actually is working out. He emphasized that everything is still “wet clay” and a participatory process, but everything is headed to April 2013 for the public launch with initial version of the service and a fair amount of content being available. We will come back a bit later to exactly what that content is and from what sources it will come.

In this welcome, Palfrey introduced several themes that the day revolved around–that the project is still moldable despite the structure that seems to be there (the “wet clay”), and that it is still completely participatory even though the project will recruit an Executive Director and has a new board. One of the roles of the board will be to ensure that participation remains broad. The credentials of the board and the steering committee are impressive; but they cannot get the project going without a lot of additional support, both financial and otherwise.

The rest of the day was organized to talk about supporting the DPLA, reporting on several of the “hubs” that will make up the first part of the content available, the inaugural board, and the technical and platform components of the DPLA. The complete day, including tweets and photos was captured in a live blog. While much of interest took place that day, I want to focus on the content and the technical implementation as described during the day.

Content: What will be in the DPLA?

Emily Gore started in September of this year as the Director of Content, and has been working since then to get the plans in motion for the initial content in the DPLA. She has been working with seven exisiting state or regional digital libraries as so-called “Service Hubs” and “Content Hubs” to take the steps to begin aggregating metadata that will be harvested for the DPLA and get people to the content. The April 2013 launch will feature exhibits showcasing some of this content–topics include civil rights, prohibition, Native Americans, and a joint presentation with Europeana about immigration.

The idea of these “hubs” is that there are already many large digital libraries with material, staff, and expertise available–as Gore put it, we all have our metadata geeks already who love massaging metadata to make it work together. Dan Cohen (director of the Roy Rosenzweig Center for History and New Media at George Mason University) gave the analogy in his blog of the local institutions having ponds of content, which then are fed into the lake of the service hubs, and then finally into the ocean of the DPLA. The service hubs will offer a full menu of standardized digital services to local institutions, including digitization, metadata consultation, data aggregation, storage services, community outreach, and exhibit building. These collaborations are crucial for several reasons. First, they mean that great content that is already available will finally be widely accessible to the country at large–it’s on the web, but often not findable or portable. Regional content hubs will be able to work with their regions more effectively than any central DPLA staff, which simply will not have the staff to deal with one-to-one relationships with all the potential institutions who have content. The pilot service hubs are Mountain West, Massachusetts, Digital Library of Georgia, Kentucky, Minnesota, Oregon, and South Carolina. The digital hubs project has a two year timeline and $1 million in funding, but for next April they will prepare metadata and content previews for harvest, harvest existing metadata to make it available for launch, and develop exhibitions. After that, the project will move on to new digitization and metadata, aggregation, new services, new partners, and targeted community engagement.

Representatives from two of the service hubs spoke about the projects and collections, which was the best view into what types of content we can expect to see next April. Mary Molinaro from Kentucky gave a presentation called “Kentucky Digital Library: More than just tobacco, bourbon, and horse racing.” She described their earliest digitization efforts as “very boutique–every pixel was perfect”, but it wasn’t cost effective or scalable. They then moved on to a system of mass digitization through automating everything they could and tweaking workflows for volume. Their developers met with developers from Florida and ended up using DAITSS and Blacklight to manage the repository. They are now at the point where they were able to scan 300,000 pages in the last year, and are reaching out to other libraries and archives around the state to offer them “the on-ramp to the DPLA”. She also highlighted what they are doing with oral history search and transcription with the Oral History Metadata Synchronizer and showed some historical newspapers.

Jim Butler from the Minnesota Digital Library spoke about the content in that collection from an educational and outreach point of view. They do a lot of outreach to to local historical societies and libraries and other cultural organizations to find out what collections they have and digitize them, which is the model that all the service hubs will follow. One of the important projects that he highlighted was an effort to create curricular guides to facilitate educator use of the material–the example he showed was A Foot in Two Worlds: Indian Boarding Schools in Minnesota, which has modules to be used in K-12 education. He showed many other examples of material that would be available through the DPLA, including Native American history and cultural materials and images of small town life in 19th and 20th century Minnesota. Their next steps are to work on state/region wide digital library metadata aggregation, major new digitization efforts, and community-sourced digital documentation, particularly in terms of Somali and Hmong communities self-documentation.

Followup comments during the question portion of these presentations emphasized that the goal of having big pockets of content is to work with those smaller pockets of content. This is a pilot business model test case to see how aggregating all these types of content together actually works. It is important to remember that for now, the DPLA is not ingesting any content, only metadata. All the content will remain in the repositories at each content hubs.

An  additional component is that all the metadata in the DPLA will be licensed with a CC0 (public domain) license only. This will set the tone that the DPLA is for sharing and reusing metadata and content. It is owned by everyone. This generated some discussion over lunch and via Twitter about what that actually would mean for libraries and if it would cause tension to release material under a public domain license that for-profit entities could repackage and sell back to libraries and schools. Most people that I spoke to felt this was a risk worth taking. Of course, future content in the DPLA will be there under whatever copyright or license terms the rightsholder allows. Presumably most if not all of it will be material in the public domain, but it was suggested, for instance, that authors could bequeath their copyrights to the DPLA or set up a public domain license through something like Either way, libraries and educators should share all the materials they create around DPLA content, and by doing so will mean less duplicate effort.

Technology: How will the DPLA work?

Jeff Licht, a member of the technical development advisory board,  spoke about the technical side of the DPLA. The architecture for the system (PDF overview) will have at its core a metadata repository aggregated from various sources described above. An ingester will bring in the metadata in usable form from the service hubs that will have already cleaned up the data, and then an API will expose the content and allow access to front ends or apps. There will also be functions to export the metadata for analysis that cannot easily be done through the API. The metadata schema (PDF) types that they collect will be item, collection, contributor, event.

One of the important points that raised a lot of discussion was that while they have contracted with iFactory to have a front end available by April, this front end doesn’t have more priority or access to the API than something developed by someone else. In fact, while someone could go to to access content, the planners right now see the DPLA “brand” as sublimated to other points of access such as local public libraries or apps using the content. Again, the CC0 license makes this possible.

The initial front end prototype is due for December, and the new API is due in early November for the Appfest (see below for details). There will be an iterative process between the API and front end between December and March before the April launch, with of course lots of other technical details to sort out. One of the things they need to work on is a good method for sharing contributed modules and code, which hopefully will be done in the next few weeks.

Anyone can participate in this process. You can follow the Dev Portal on the DPLA wiki and the Technical Aspects workstream to participate in decision making. Attending the Appfest hackathon at the Chattanooga Public Library on November 8 and 9 will be a great way to spend time with a group creating an application that will use the metadata available from the hubs (the new API will be completed before the Appfest). This is the time to ask questions and make sure that nothing is being overlooked.

Conclusion: Looking ahead to April 2013

John Palfrey closed the day with reminding everyone that April is just the start, and not to be disappointed with what they see then. If April delivers everything promised during the DPLA Midwest meeting, then it will be a remarkable achievement–but as Doran Weber from the Sloan Foundation pointed out, the DPLA has so far met every one of its milestones on time and on budget.

I found the meeting to be inspirational about the future for libraries to cross boundaries and build exciting new collections. I still have many unanswered questions, but as everyone throughout the day understands, this will be a platform on which we can build and imagine.


The Digital Public Library of America: What Does a New Platform Mean for Academic Research?

Robert Darnton asked in the New York Review of Books blog nearly two years ago: “Can we create a National Digital Library?” 1 Anyone who recalls reference homework exercises checking bibliographic information for United States imprints versus British or French will certainly remember the United States does not have a national library in the sense of a library that collects all the works of that country and creates a national bibliography 2 Certain libraries, such as the Library of Congress, have certain prerogatives for collection and dissemination of standards 3, but there is no one library that creates a national bibliography. Such it was for print, and so it remains even more so for digital. So when Darnton asks that–as he goes on to illuminate further in his article–he is asking a much larger question about  libraries in the United States. European and Asian countries have created national digital libraries as part of or in addition to their national print libraries.  The question is: if others can do it, why can’t we? Furthermore, why can’t we join those libraries with our national digital library? The DPLA has  announced collaboration with Europeana, which has already had notable successes with digitizing content and making it and its metadata freely available. This indicates that we could potentially create a useful worldwide digital library, or at least a North American/European one.The dream of Paul Otlet’s universal bibliography seems once again to be just out of reach.

In this post, I want to examine what the Digital Public Library of America claims to do, and what approaches it is taking. It is still new enough and there are still enough unanswered questions to give any sort of final answer to whether this will actually be the national digital library. Nonetheless, there seems to be enough traction and, perhaps more importantly, funding that we should pay close attention to what is delivered in April 2013.

Can we reach a common vision about the nature of the DPLA?

The planning for the DPLA started in the  fall of 2010 when Harvard’s Berkman Center received a grant from the Sloan Foundation to begin planning the project in earnest. The initial idea was to digitize all the materials which it was legal to digitize, and create a platform that would be accessible to all people in the US (or nationally). Google had already proved that it was possible, so it seemed that with many libraries working together it would be concievable to repeat their sucesses, but with solely non-commerical motives  4.

The initials stages of planning brought out many different ideas and perspectives about the philosophical and practical components of the DPLA, many of which are still unanswered. The theme of debate that has emerged are whether the DPLA would be a true “public” library, and what in fact ought to be in such a library. David Rothman argues that the DPLA as described by Darnton would be a wonderful tool for making humanities research easy and viable for more people, but would not solve the problems of making popular e-books  accessible through libraries or getting students up-to-date textbooks. The latter two aims are much more challenging than getting access to public domain or academic materials because a lot more money is at stake 5.

One of the projects for the Audience and Content workstream is to figure out how average Americans might actually use a digital public library of America. One of the potential use cases is a student who can just use DPLA to write a whole paper on the Iriquois Nations. Teachers and librarians posted some questions about this in the comments, including questioning whether it is appropriate to tell students to use one portal for all research. We generally counsel students to check multiple sources–and getting students used to searching one place that happens to be appropriate for searching one topic may not work if the DPLA has nothing available on say, the latest computer technology.

Digital content and the DPLA

What content the DPLA will provide will surely become more clear over the following months. They have appointed Emily Gore as Director of Content, and continue to hold further working groups on content and audience. The DPLA website promises a remarkable vision for content:

The DPLA will incorporate all media types and formats including the written record—books, pamphlets, periodicals, manuscripts, and digital texts—and expanding into visual and audiovisual materials in concert with existing repositories. In order to lay a solid foundation for its collections, the DPLA will begin with works in the public domain that have already been digitized and are accessible through other initiatives. Further material will be added incrementally to this basic foundation, starting with orphan works and materials that are in copyright but out-of-print. The DPLA will also explore models for digital lending of in-copyright materials. The content that is contributed to or funded by the DPLA will be made available, including through bulk download, with no new restrictions, via a service available to libraries, museums, and archives in the United States, with use and reuse governed only by public law.  6

All of these models exist in one way or another already, however, so how is this something new?

The major purveyors of out of copyright digital book content are Google Books and HathiTrust. The potential problems with Google Books are obvious just in the name–Google is a publicly traded company with aspirations to be the hub of all world information. Privacy and availability, not to mention legality, are a few of the concerns. HathiTrust is a collective of research universities digitizing collections, many in concert with Google Books, but the full text of these books in a convenient format is generally only available to members of HathiTrust. HathiTrust faced a lawsuit from the Authors Guild about its digitization of orphan works, which is an issue the DPLA is also planning to address.

Other projects exist trying to make currently in copyright digital books more accessible, of which is probably best known. This requires a critical mass of people to actively work to pay to release a book into the public domain, and so may not serve the scholar with a unique research project. Some future plans for the DPLA include to obtain funds to pay authors for use–but this may or may not include releasing books into the public domain.

DPLA is not meant to include books alone. Planning so far suggests that books make a logical jumping off point. The “Concept Note” points out that “if it takes the sky as its limit, it will never get off the ground.” Despite this caution, ideally it would eventually be a portal to all types of materials already made available by cultural institutions, including datasets and government information.

Do we need another platform?

The first element of the DPLA is code–it will use open source technologies in developing a platform, and will release all code (and the tools and services this code builds) as open source software.  The so-called “Beta Sprint” that took place last year invited people to “grapple, technically and creatively, with what has already been accomplished and what still need to be developed…” 7. The winning “betas” deal largely with issues of interoperability and linked data. Certainly if a platform could be developed that solved these problems, this would be a huge boon to the library world.

Getting involved withe DPLA and looking to the future

While the governance structure is becoming more formal, there are plenty of opportunities to become involved with the DPLA. Six working groups (called workstreams) were formed to discuss content, audience, legal issues, business models, governance, and technical issues. Becoming involved with the DPLA is as easy as signing up for an  account on the wiki and noting your name and comments on the working group page in which are interested. You can also sign up mailing lists to stay involved in the project. Like many such projects, the work is done by the people who show up and speak up. If you read this and have an opinion on the direction the DPLA should take, it is not difficult to make sure your opinion gets heard by the right people.

Like all writing about the DPLA since the planning began, turning to a thought experiment seems the next logical rhetorical step. Let’s say that the DPLA succeeds to the point where all public domain books in the United States are digitized and available in multiple formats to any person in the country, and a significant number of in copyright works are also available. What does this mean for libraries as a whole? Does it make public libraries research libraries? How does it change the nature of research libraries? And lastly, will all this information create a new desire for knowledge among the American people?

  1. Darnton, Robert. “A Library Without Walls.” NYRblog, October 4, 2010.
  2. McGowan, Ian. “National Libraries.” In Encyclopedia of Library and Information Sciences, Third Edition, 3850–3863.
  3. “Frequently Asked Questions – About the Library (Library of Congress).” Text, n.d.
  4. Dillon, Cy. “Planning the Digital Public Library of America.” College & Undergraduate Libraries 19, no. 1 (March 2012): 101–107.
  5. Rothman, David H. “It’s Time for a National Digital-Library System.” The Chronicle of Higher Education, February 24, 2011, sec. The Chronicle Review.
  6. “Elements of the DPLA.” Digital Public Library of America, n.d.
  7. “Digital Public Library of America Steering Committee Announces ‘Beta Sprint’ ”, May 20, 2011.