Category Archives: Libraries

Summer Reading Program Form: SO SIMPLE

Last time I posted I was still months away from having a baby, now I have a four month old. Time to get back into blogging. I have some things specifically about babies and being a working mother and so on, but those will have to wait.

A few weeks ago I asked around on Twitter about creating leaderboards, but couldn’t find a lot out there that worked for exactly what I needed. I was trying to make a simple application for a staff summer reading program that would allow participants to register the pages they’ve read and leave reviews, but without a lot of bureaucracy and overhead. The main tool I found was Leaderboarded, which was interesting, but way too much for the very simple thing I was trying to do. While I could have created a form on our website using Drupal or one of our PHP forms, I figured that the much easier solution would be to use a Google Form, since otherwise it would have been very difficult for the people running the program to monitor participation unless I built an entire web application, which was overkill for a temporary program.

After poking around a bit, I found this super simple solution from David Hay at the Elk Island Public Schools (in Alberta!) Technology blog. Create a form in Google Forms that asks for a name, and then any points to which the student is entitled. Then use pivot tables with the spreadsheet to get totals, and use charts to visualize these. This was very simple, and I also appreciated his script that copies formulas to the last row of the spreadsheet as people add new entries. This of course assumes you have a unique key of a person’s name added the same way each time, which for a small group of people for a short amount of time is probably a reasonable assumption.

Here’s what I ended up with. You won’t be able to complete the form without Loyola email address (to hopefully prevent spam), but here’s what happens. Most of the time people will just be entering the pages they read last, and won’t need to add anything else. They will fill in that information, and then submit the form. But if they’ve finished a book, they can leave a review for more points. Clicking the “Did you finish a book?” radio button will take them to page 2, which is below. They can fill in author and title only, or add a review for 50 points. They also get an additional 10 points for checking the book out of the LUC libraries.



On the back end, this form records to a spreadsheet. The points are added up in an additional column I added to the spreadsheet. It adds in additional points for the challenges to the pages read using conditionals, which I’d not really used in Google Forms before. They work just as you would expect.

=C2 [pages read]+(if(J2[LUC libraries]="Yes",10[condition if true],0[condition if false])+if(isblank(I2)[review box],0[condition if true],50[condition if false))

Then I use a series of pivot tables and charts to add up the total points for each person (which we will use internally for entering participants in a raffle and determining the overall winner) and departmental points, which we will use to cheer on departments and try to make them compete against each other. To check for the individuals who are ahead, I use a pivot table with a row grouped by name with the sum of total points, and the same thing for departments. Then I made a bar graph sorted by total to make a departmental leaderboard to post on the website, which you can see here (as of publication it’s blank since the program hasn’t started yet).

Since everything is stored in an online form, the group who is running this program will be able to make edits to the form and the data if they need to, and copy the reviews out of the spreadsheet to post on the library’s blog. Very simple, and not using a database when a spreadsheet will do!



Suzanne Briet: Antelopes can be documents

In honor of Ada Lovelace Day, #LibTechWomen is blogging about one of the great early library and information science theorists, Suzanne Briet. You can read all our blog posts on Twitter using the hash tag #briet. In a shocking but not surprising turn of events, a recent textbook gave Paul Otlet credit for the one thing that everyone should be able to remember from the first semester of library school: a wild antelope is not a document, but an antelope in a zoo is, since it was collected, cataloged, and provides evidence. Unfortunately textbooks have a way of perpetuating wrong information from generation to generation, and critical thinking and research skills are woefully lacking. So, we are remembering Madame Documentation on a day dedicated to remembering and celebrating women in STEM fields.

My personal forever favorite tribute to Suzanne Briet comes in the form of a critical puppet show put on by the Self Preservation Working Group at the Read/Write Library. Watch and enjoy.

Analyzing CVs for publisher copyrights and self-archiving with OpenRefine

This originally appeared on the ACRL TechConnect blog.

I started working on this project yesterday, but I wanted to write it up as quickly as possible so that I could see how others are approaching this issue. First of all, I should say that this approach was inspired by this article in the Code4Lib Journal: “Using XSLT and Google Scripts to Streamline Populating an Institutional Repository” by Stephen X. Flynn, Catalina Oyler, Marsha Miles.

The problem I had was a faculty member who sent a CV to a liaison for adding the items to the repository, but whose citations were not showing up in the citation databases–and I now work at an institution with all of the resources I need for this. So I wanted to go the other way, and start with the CV and turn that into something that I could use to query SHERPA/RoMEO.

It occurred to me that the best tool for this might be Google Refine (now OpenRefine, I guess), which I’ve always wanted to play around with. I am sure there are lots of other ways to do this, but I found this pretty easy to get set up. Here’s the approach I’m taking, with a version of my own CV.

  1. Start with a CV, and identify the information you want–you could copy the whole thing or use screen scraping or what have you, but most people’s CVs are about 20 pages long and you only care about 1 or 2 pages of journal
  2. Copy this into a text editor to remove weird formatting or spacing. You want to have each citation in its own line, so if you had a CV with hanging indents or similar you would have to remove those.text
  3. Now (assuming you’ve installed and opened Google Refine), either import this text file, or copy in the text. Import it as line based text file, and don’t select anything else.
  4. Click on Create Project>>, and it will bring it into Google Refine. Note that each line from the text file has become a row in the set of data, but now you have to turn it into something useful.refine1
  5. My tactic was to separate the author, date, title, journal title, and other bibliographic information into their own columns. The journal title is the only one that really matters for these purposes, but of course you want to hang on to all the information. There are probably any number of ways to accomplish this, but since citations all have a standard structure, it’s really easy to exploit that to make columns. The citations above are in APA style, and since I started with social work faculty to test this out, that’s what I am starting with, but I will adjust for Chicago or MLA in the future. Taking a look at one citation as example we see the following:

    Heller, M. (2011). A Review of “Strategic Planning for Social Media in Libraries”. Journal of Electronic Resources Librarianship, 24 (4), 339-240)

    Note that we always have a space after the name, an open parentheses, the date, a closed parentheses, and a period followed by a space (I’ve colored all the punctuation we want blue). So I can use this information to split the columns. To do so, select “Split into several columns” from the Edit Column menu.
    Then in the menu, type in the separator you want to use, which in this case is space open parentheses. Split into two columns, and leave the rest alone. Note that you can also put a regular expression in here if necessary. Since dates are always the same length you could get away with field lengths, but this way works fine.
    After this, you will end up with the following change to your data:
    Now the author is in the first column, and the opening parenthesis is gone.

  6. Following along with the same rationale for each additional field and renaming the columns we end up with (cooking show style) the following:
    refine5Each piece of information is in its own column so we can really start to do something with it.
  7. I am sure there’s a better way to do this, but my next step was to use the journal title as the query term to the SHERPA/RoMEO API call. This was super easy once I watched the data augmentation screencast; there is documentation here as well. Open up the following option from the edit column menu:
    You get a box to fill in the information about your API call. You have all kinds of options, but all you really need to do for this is format your URL in the way required by SHERPA/RoMEO. You should get an API key, and can read all about this in the article I linked to above. There are probably several ways to do this, but I found that what I have below works really well. Note that it will give you a preview to see if the URL is formatted in the way you expect. Give your column a name, and set the Throttle delay. I found 1000 worked fine.
    refine7In a copy and pastable format, here’s what I have in the box:'[YOUR API KEY HERE]&qtype=starts&jtitle=' + escape(value,'url')
  8. Now it will run the query and send back a column of XML that looks like this:
  9. From this you can extract anything you want, but in my case I want to quickly get the pre and post-archiving allowances, plus any conditions. This took me awhile to figure out, but you can use the Googe Refine Expression Language parseHtml function to work on this. Click on Add column based on this column from the Edit Column menu, and you will get a menu to fill in an expression to do something to your data. After a bit of trial and error, I decided the following makes the most sense. This grabs the text out between the <prearchiving> element in the XML and shows you the text. You can see from the preview that you are getting the right information. Do the same thing for post-archiving or any other columns you want to isolate. The code you want is value.parseHtml().select("elementname")
    Conditions are a little different, since there are multiple conditions for each journal. In that case, you would use the following syntax (after join you can put whatever separator you want): forEach(value.parseHtml().select("condition"),v,v.htmlText()).join(". ")"
  10. Now you have your data neatly structured and know the conditions for archiving, hooray! Again, cooking show style, here’s what you end up with. You can certainly remove the SHERPA/RoMEO column at this point, and export the data as Excel or whatever format you want it in.
  11. BUT WAIT, IT GETS BETTER. So that was a lot of work to do all that moving and renaming. Now you can save this for the future. Click on Undo/Redo and then the Extract option.
    Make sure to unclick any mistakes you made! I entered the information wrong the first time for the API call, so that added an unnecessary step. Copy and paste the JSON into a text editor and save for later. refine13
  12. From now on when you have your CV data, you can click on the Undo/Redo tab and then choose Apply. It will run through the steps for you and automatically spit out your nicely formatted and researched publications. Well… realistically the first time it will spit out something with multiple errors, and you will see all the typos that are messing up your plan. But since the entire program is built to clean up messy data, you’re all set on that end. Here’s the APA format I described above for you to copy and paste if you like–make sure to fill in your API key: cvomatic

I hope this is useful to some people. I know even in this protean form this procedure will save me a ton of time and allow library liaisons to concentrate on outreach to faculty rather than look up a ton of things in a database. Let me know if you are doing something similar and the way you are doing it.