Real World Semantic Web?: Facebook’s Open Graph Protocol (ACRL Tech Connect Post)
Originally posted at ACRL Tech Connect on May 10, 2012.
Librarians need to understand what the semantic web is and how to use it, but this can be challenging. While the promise of the semantic web has existed for over a decade, to the uninitiated there may not seem to be many implementations that are accessible to the average person.
One implementation that most people use daily is Facebook’s Open Graph Protocol, which is their version of the semantic web. This is a useful example to illustrate the ideas behind the semantic web and linked data. Libraries and other cultural institutions want and need to make their data open, and Facebook’s openness is highly questionable, so it will also illustrate some of the potential problems with linked data that isn’t open. There is much great work being done in the library world with the semantic web and linked data, which will be addressed in more detail in further posts.
The Semantic Web and Linked Data
The “semantic web” describes a web where data is understood by computers in some of the same ways humans understand it. Tim Berners-Lee illustrates this wonderfully in his 2001 Scientific American article with a future in which the diagnosis of a family member with cancer is made easier by the smart device which can find the most appropriate specialist in a convenient location at a convenient time, with very little work on the part of the searcher. This is only possible, however, when data is semantically meaningful. Open hours for a doctor (or a library) written on a website mean something to a human, but very little to a computer. Once those hours are structured in a way that can be made meaningful, the computer can tell you if the doctor’s office is open–and if it has access to your calendar, what you have to cancel to go there.
Linking data takes this implementation a step further and makes it possible to connect data, to avoid, as the W3C says “a sheer collection of datasets”. Berners-Lee outlines the steps that need to be followed to make linked data in a 2006 post, namely to use uniform resource indicators (URIs) as names, to present those URIs in the hypertext protocol, use a standard format such as RDF to present useful information, and link to additional URIs with related information. A 2010 follow-up points out that to be linked open data, the data must be presented with a license that allows free unimpeded use, such as the Creative Commons CC-BY license. Such data doesn’t have to be structured in any particular way as long as it’s open. He says that “…you get one (big!) star if the information has been made public at all, even if it is a photo of a scan of a fax of a table — if it has an open licence.” But “five-star” linked open data meets all of the above requirements as well.
Facebook’s Open Graph Protocol
Moving into a different world, let’s consider what the semantic web and linked data look like at Facebook. First, it is interesting to consider what Facebook was before it was semantic. When Facebook first started in 2005, you could make a list of things you “liked”. You might have said you “liked” the movie Clueless and “liked” running, but these were just lists that would let others in your college classes know a few facts about you next time you saw them in class or at a party. In theory you could use these lists to find others that shared your interests, but this required a person to understand what interests matched each other.
But starting in 2010 these “likes” took on a real semantic meaning. Suddenly “liking” the movie Clueless meant that, among other things, the owners of the “Clueless” identity on Facebook could directly send you marketing announcements. In addition, you could “like” content outside of Facebook completely as long as that website used the correct markup on the page to speak to Facebook, and thus link together content with people. Unlike Facebook’s earlier scheme of Beacon, it was easier to understand how you were exposing yourself to advertisers and to control privacy and sharing, though this still left people troubled.
In late 2011/early 2012 Facebook opened up this system even more to third party developers, which went along with the new Facebook Timeline. Now any person could perform any verb with any application. So “Margaret read a book on Goodreads” or “Margaret listened to a song on Spotify”–real world actions–turn into semantically meaningful statements on my Facebook Timeline. As long as the user authenticates the application, the application can access the necessary information to grab the information about the object from the webpage and show the user’s interaction with it.
Developing for the Open Graph
The Open Graph protocol was developed based on the idea of the “social graph”, which represents the connections between people and the types of relationships they have with each other. In the Facebook universe, this includes the relationships people have with other types of entities, such as media, products, and companies. It was developed by Facebook to make a quick and easy way for websites to include semantically meaningful data. It is based on the standard RDF specification for linked data and includes basic and optional metadata, as well as different types of structured data about objects, of which music and videos are the most well-defined.
To see the Open Graph in action, simply replace “www” with “graph” at the beginning of any Facebook page. For instance, let’s take a look at my own library’s information at http://graph.facebook.com/rebeccacrownlibrary. You can see that this page describes a library, and get our phone number, physical location, and open hours. Most important, a computer viewing this page can understand this information. For complete details, see the Graph API documentation–even for non-developers this is interesting; for instance, find out how to get the URL for your current profile picture to embed in other sites. To get access to this information, you can use various methods, including the Facebook Query Language.
Of course, you only get access to this information if it’s explicitly made public by the page. For anything beyond that, applications must use authentication in order to access more. Linking information from outside of Facebook is one way only–you can’t pull very much at all out of Facebook into the open web. Note that, for instance, Google searches will pull up only basic information from a Facebook page rather than any content that page has posted.
Outside of Facebook–How “Open” is the Open Graph?
It is precisely this closed effect that has a lot of people worried about Facebook’s implementation of the semantic web. Brad Fitzpatrick described the problems in 2007 inherent in implementations of the “social graph” on the web, which was that standards were quirky, non-interoperable, and usually completely walled off. The solution would be a Social Graph API that would create a social graph outside of any one company and belonging to all. This would allow people to find friends and connections without signing up for additional services or relying on Facebook or any other company. Fitzpatrick did later create a Social Graph API, which Google recently pulled out of their products. Some of the problems of an open social graph are familiar to librarians: people are hesitant to share too much information with just anyone about with whom they associate, what they like, and what they think (Prodromou). The great boon for advertisers in social networking services is that inside walled gardens with reasonable privacy controls is that people are willing to share much more information. Thus the walled garden of Facebook, inaccessible to Google, means that that valuable social data is inaccessible. It is perhaps not coincidental that around the same time Google stopped supporting the open Social Graph API that they released the API for their own social networking service Google Plus.
Concerns with the Open Graph remain that it is not actually open, and in particular that it uses the open standard of RDF to ingest but not share content (Turenhout). The Open Graph Protocol website states that a variety of big websites are publishing websites with Open Graph markup and it is ingested by Facebook (of course), Google, and mixi. It remains unclear how much this particular standard will be adopted outside of Facebook.
Conclusion
Whether or not you think you have any idea what linked data is, any time you click a “like” button on a website or sign up for a social sharing app in Facebook, you are participating in the semantic web. But every time that data link goes behind a Facebook wall, it fails in being open linked data. Just as librarians have always worked to keep the world’s knowledge available to all, we must continue to ensure that potentially important linked data is kept open as well–and with no commercial motive. The LODLAM Summit has outlined and continues to work on what linked open data looks like for libraries, archives, and museums. The W3C Library Linked Data Incubator Group released its final report in fall 2011, which provides a thorough overview of the roles and responsibilities of libraries in the world of linked open data. There is a lot of possibility around this area right now, and the future openness of the world wide web may very well depend on action taken right now.
In a future post, we will examine some specific examples of work being done in the library world around the semantic web and linked data.
Works Cited
Some obvious points about using Facebook
I am presenting an example of how we used Facebook effectively at Dominican University–but I hope this information was already obvious. This is partly a reminder to myself about what works and to force myself to repeat a successful use of Facebook.
National Library Week was a few weeks ago, and our library’s administrative assistant Sharon Tobin and Photoshop whiz got the idea to make posters for the week with the theme “Keep Calm and Read On”. (She is also my officemate and we are pretty much a Masterpiece Theatre appreciation society in here). We always have our own National Library Week theme that has pretty much nothing to do with ALA’s theme. Works for us.
Anyway, the posters were a huge hit. (This one is mine). I pushed them heavily on the blog, the library website, and for the purposes of this post, the Facebook page.
Here are the Facebook statistics for that week (This was the most popular post of the week.) :
Note the huge spike in mid-April, which immediately fell after the end of the week, when we went back to our usual posting routines.
Why? Well, it was a clever idea, timely, and featured interesting sets of images that people wanted to click through and look at. These are all hallmarks of good Facebook material. My advice for good use of Facebook is to look around and see the clever and interesting ideas of the people at your library, and show them on Facebook. It may not revolutionize library service, but it may enlighten, instruct, and/or entertain your patrons, which is certainly an important part of the role of the library.
Filed under: Libraries,Social Networks
Report on “Copyright and Fair Use in the Digital Age”
On Monday I attended a very useful meeting put on by various institutions, including CARLI, Northwestern University, and the Association of Research Libraries called “Copyright and Fair Use in the Digital Age.” The first part of the program was a chance to learn the details about the Code of Best Practices for Fair Use in Academic and Research Libraries, which was published in January 2012 as a joint project of the Association of Research Libraries, Center for Social Media at American University and the Program for Information Justice and Intellectual Property at American University. The second part of the program was from Creative Commons on “Fair Use and Copyright in the Age of the Internet.” (I will probably get those notes done too at some point). I am presenting below the notes I took during the meeting, but you can follow the first presentation at the ARL site, as well as find a lot more information and the actual code at arl.org/fairuse.
The meeting was prompted by the new Code of Best Practices for Fair Use in Academic and Research Libraries, which is one of a series of guides addressing fair use in many different communities. Co-facilitators Brandon Butler (ARL) and Peter Jaszi (American University) are going around the country to help introduce the code. The guide provides tools to address challenges in determining what is fair use.
Fair Use and Research Librarians
Copyright and Fair Use
Peter Jaszi delivered the first part of the talk, which covered the legal background. After this Brandon Butler joked that we’d just gotten $7000 worth of law school. The purpose of copyright is often misunderstood or misconstrued. While some people will tell you that copyright has always been for the rights of authors, it isn’t the true story. In fact, the historic and constitutional truth is that copyright isn’t meant to reward authors or publishers, but rather is a means to promote the creation of culture. That authors and publishers are rewarded is merely a side effect. (I got a slightly different understanding of this historically from Adrian Johns’ Piracy, but that’s in a non-American context, which is of course what this talk was about). Giving people limited monopoly supposedly stimulates people to invest their time and effort in creating cultures. Another way to encourage the making of new culture is to use existing culture, so this creates a tension.
This tension could serve to unbalance the whole culture creating enterprise, so there are certain “balancing features”, such as limiting the term of the copyright (which is being eroded all the time). The biggest and most controversial balancing measure is fair use: “legal, unauthorized use of copyrighted material– under some circumstances.” As Jaszi remarked, fair use allows a space for artistic creativity–as well as legal creativity.
There are “Four Factors” in the copyright bill of 1976 in section 107 which help determine whether the use is fair. To some people these may be very well know, but I certainly need a review every so often. They are:
- Reason for the use
- Kind of work
- Amount used
- Effect on Market
This is fairly loose–while it does allow judges flexibility in making their decisions, it is difficult to apply to specific cases. It is also difficult to use to make prospective decisions about what you should do in a certain case. The Supreme Court has affirmed the critical importance of fair use in free speech. It has moved into the center and is being perceived as core feature of copyright. There has been a big shift in rulings since 1990. There has been a practical renaissance and sea change in judicial interpretation with very positive connotations for various communities, including libraries. Judges now ask the following questions:
- Whether the use is transformative? Is it a new purpose, context, audience, insight?
- Is the amount of material used appropriate for the transformative purpose?
If both the answers are true, then in the vast majority of cases judge will rule fair use. There are some misunderstandings about both of these concepts, however. First, “transformative” does not have to create an independently copyrightable work, nor does the use have to be a physical transformation. One example from the scholarly research realm. You could use text and images in scholarly studies: not modifying, but re-contextualizing. For instance, taking images of Google search and making into poster is an arguable fair use.
The Best Practices Approach
Judges pay attention to community practice and community values. A consensus about what constitutes fair use in a communities helps to create a better culture of fair use as well as to serve as a defense of such use. American University has been working with a variety of communities to facilitate creation of fair use principles by those communities and figure out best way of handling the material. Communities with which they have worked include documentary filmmaker, scholars, media literacy, online video, dance collections, and open courseware.
These codes have had rapid effects on gatekeepers in various areas. One example is for documentary filmmakers, which now have an easier time getting insurance for their projects when they can show that what they are doing in their works constitutes fair use. These codes have transformed their fields and brought about new culture that wouldn’t have been possible under more timid definitions.
Some important realizations for going into this document is that these are best practices, not guidelines. There are a lot of pieces of folklore floating around with specific metrics, but these are not based on law. For instance, it’s not true to say that you can use two minutes or 11 bars or what have you of a piece to make it fair use. Following such guidelines will not serve as legal protection. To that end, you can understand the document as:
- Principles, not rules.
- Limitations, not bans. Limitations that are embodied in the codes are as integral to the document as the principles, but they are not hard and fast rules. Rather they define a thought process
- They promote reasoning, not rote.
Libraries and Fair Use
Obviously this guide is aimed at librarians. The genesis was that libraries have a duty to preserve the past, answer present needs (they are first and sometimes last line of copyright defense on campuses), and think about the future role of libraries and keeping libraries relevant. As the limits of copyright keep getting extended, fewer and fewer things in library collections will fall into the public domain. An additional problem is that as people want everything in digital form, this involves making lots of copies.
When they did initial field research for this project, they found that librarians had the same common set of issues. First, there is a great deal of insecurity and hesitation–stakes feel very high and no one wants to get the answer wrong. “No one ever got sued for saying no,” as Brendan Butler says. Sometimes fears have led to sunk costs in projects that were important but people were too concerned to follow through. There was a general sense that fair use is an important tool, but they didn’t know how to use it. People revert to “risk aversion” rather than “risk management”. There is never no risk. But in the copyright world there seems to be no tolerance for fear ever. We need to be more sensible about this. They created this code in order to help librarians feel more comfortable in using fair use as a tool in academic libraries
Code of Best Practices in Fair Use for Academic and Research Libraries
Brandon and Peter switched off discussing the Code and introducing its principles. It was created by librarians. Deep deliberation by 90 librarians from 64 institutions in 9 four-hour discussions. ARL and the American Center for Social Media et al. took detailed notes of these conversations. They were then reviewed by a diverse panel of legal experts. This wasn’t to develop a “perfect” code, but to see if it’s within reason. The Code puts legal risks into perspective–balance legal risk against “mission risk”. There is a real value to projects that get ended because of fear.
They said of course to read the Code, but outlined the most noteworthy or important features, which I have noted below. Each numbered item in bold represents a principle that academic librarians agree is fair use, Peter and Brandon’s comments about it, and the important limitations to keep in mind, plus some enhancements that make it even more likely the use will be perceived as fair.
Fair Use in 8 Common Principles
One: Digital access to teaching materials for students and professors.
- This is not about physical materials. This is not about re-fighting the course pack wars. This document focuses on the digital, such as e-reserves, course websites, streaming, etc.
- There are significant limitations to this, however. Most important: focus on pedagogical content of content being chosen.
- When you use someone’s content, you must identify the source.
- Consensus about providing certain enhancements. They were dismissive of the notion of spontaneity. You can’t use something at the last minute and that makes it ok. But do have periodic review of content to see what remains necessary.
- Georgia State hasn’t been decided yet, but there are other things coming up potentially. This document may influence the debate.
Two: Exhibits, both physical and digital.
- Lots of libraries have fascinating collections and are building exhibitions and need guidance on building these.
- People who own copyrights really value attribution.
- Take a look at Library of Congress boilerplate on American Memory for ideas.
Three: Digitizing to preserve at-risk items
- If the format isn’t already falling apart or already obsolete, section 108 doesn’t cover it.
- Section 107 provides back-up to section 108.
- Once the digital copies of items have been made, it should be possible to make these items available to the learning community rather than the fragile materials.
- Use reasonable measures to ensure that to the extent possible only accredited users have access to material.
- Librarians felt they have an ethical duty to provide measures to limit downstream use of the material. This is ethical feeling only, has no legal basis. Librarians have been socialized into believing that they have legal liability if a legal copy they make ends up in illegal use.
Four: Digital collections of archives and special collections.
- How do we take rare and unique items and make them available more broadly?
- Librarians came to consensus that this is fair use. Similar rationale to exhibitions.
- If you have a special collection of a famous person, and they made no modification to the work, you want a record of it, but not have complete version of it.
- Limit access to damaging or sensitive private information.
Five: Access to research and teaching materials for disabled students.
There was little controversy and so much broad consensus on this that they said it wasn’t worth going into detail on this.
Six: Institutional repositories, e.g. dissertations, multimedia research
- IR contract often has clause that third party material has express permission for use, so people were cutting out material willy-nilly before depositing it.
- Up front before the person starts researching, educate about fair use.
- ProQuest was “the main bad guy”, but getting better.
- We all can improve on the language used and education.
Seven: Creating digital databases for “non-consumptive uses” (digitizing, indexing for search).
- Data mining–you aren’t interested in reading the sources, but algorithmic analyses. This goes on in STEM and humanities. Is this a legitimate fair use activity?
- Subject to limitations, mostly about restriction, this is fair use.
Eight: Making topically-based collections of Web-based material
- These are fragile and sensitive to being taken down. The purpose for which they will be consulted later is quite different than the original purpose, so this clearly seems fair use.
Practice Makes Practice
Brandon concluded by saying that he was obligated to say by the American Bar Association that “Fair use is a muscle, if you don’t use it will atrophy.” I am guessing this is hilarious lawyer humor. There were a few last comments and questions. Someone asked if the Code would be updated over time. They said that the last two principles reflected emergent scholarly practices that will have relevance to the next set of practices. Also, any institutions which uses this can and should create its own localized version. Everyone ought to know that these are opportunities that are available. Librarians know that the opportunities are fragile, and the easiest way to lose them is to sign license agreements that restrict your fair use. Suggestions for communicating with others on campus were to share this with the general counsel, but also provide backup documentation with legal information. This is something that ARL and AUSOC can help with. Faculty, on the other hand, probably need this boiled down into something more easily consumed.
I learned a lot from this presentation and feel a lot more confident in my ability to make decisions and explain concepts to faculty and students now, so it was definitely time well spent. I also really enjoy Sarah Hinchcliff Pearson on Creative Commons but will post that later.
Filed under: Conferences,LibrariesNext Page »

