Part II: Product

(Guest blog by Rob Hudson)

Arthur Rubinstein (Linked Data)In Part I of this blog, I began telling you about my experience transforming Carnegie Hall’s historical performance history data into Linked Open Data, and in addition to giving some background on my project and the data I’m working with, I talked about process: modeling the data; how I went about choosing (and ultimately deciding to mint my own) URIs; finding vocabularies, or predicates, to describe the relationships in the data; and I gave some examples of the links I created to external datasets.

In this installment, I’d like to talk about product: the solutions I examined for serving up my newly-created RDF data, and some useful new tools that help bring the exploration of the web of linked data down out of the realm of developers and into the hands of ordinary users. I think it’s noteworthy that none of the tools I’m going to tell you about existed when I embarked upon my project a little more than two years ago!

As I’ve mentioned, my project is still a prototype, intended to be a proof-of-concept that I could use to convince Carnegie Hall that it would be worth the time to develop and publish its performance history data as Linked Open Data (LOD) — at this point, it exists only on my laptop. I needed to find some way to manage and serve up my RDF files, enough to provide some demonstrations of the possibilities that having our data expressed this way could afford the institution. I began to realize that without access to my own server this would be difficult. Luckily for me, 2014 saw the first full release of a linked data platform called Apache Marmotta by the Apache Software Foundation. Marmotta is a fully-functioning read-write linked data server, which would allow me to import all of my RDF triples, with a SPARQL module for querying the data. Best of all, for me, was the fact that Marmotta could function as a local, stand-alone installation on my laptop — no web server needed; I could act as my own, non-public web server. Marmotta is out-of-the-box, ready-to-go, and easy to install — I had it up and running in a few hours.


Rob HudsonRob Hudson - Photo by Gino Francesconi

Part I: Process

(Guest blog by Rob Hudson)

My name is Rob Hudson, and I’m the Associate Archivist at Carnegie Hall, where I’ve had the privilege to work since 1997. I’d like to tell you about my experience transforming Carnegie Hall’s historical performance history data into Linked Open Data, and how within the space of about two years I went from someone with a budding interest in linked data, but no clue how to actually create it, to having an actual working prototype.

First, one thing you should know about me: I’m not a developer or computer scientist. (For any developers and/or computer scientists out there reading this right now: skip to the next paragraph, and try to humor me.) I’m a musician who stumbled into the world of archives by chance, armed with subject knowledge and a love of history. I later went back and got my degree in library science, which was an incredibly valuable experience, and which introduced me to the concept of Linked Open Data (LOD), but up until relatively recently, the only lines of programming code I’d ever written was a “Hello, World!” - type script in Basic — in 1983. I mention this in order to give some hope to others out there like me, who discovered LOD, thought “Wow, this is fantastic — how can I do this?”, and were told “learn Python.” Well, I did, and if I can do it, so can you — it’s not that hard. Much harder than learning Python — and, one might argue, more important — is the much more abstract process of understanding your data, and figuring out how to describe it. Once you’ve dealt with that, the transformation via Python is just process — perhaps not a cakewalk, but nonetheless a methodical, straightforward process that you can learn and tackle, step by step.

Now let me tell you a bit about the data that I worked with for my linked data prototype. The Carnegie Hall Archives maintains a database that attempts to track every event, both musical and nonmusical, that has occurred in the public performance spaces of Carnegie Hall since 1891. (Since the CH Archives was not established until 1986, there are some gaps in these records, which we continue to fill in using sources like digitized newspaper listings and reviews, or missing concert programs we buy on eBay.) This database now covers more than 50,000 events of nearly every conceivable musical genre: classical, folk, jazz, pop, rock, world music, and no doubt some I’m overlooking.  But Carnegie Hall has always been about much more than music; its stages have also featured dance and spoken word performances, as well as meetings, lectures, civic rallies, political conventions — there was even a children’s circus, complete with baby elephants, in 1934. Our database has corresponding records for more than 90,000 artists, 16,000 composers and over 85,000 musical works. Starting in 2013, we began publishing some of these records to our website, where you can now find the records for nearly 18,000 events between 1891 and 1955.  The limited release reflects our ongoing process of data cleanup, and we’re continuing to publish new records each month.  For my linked data prototype, I chose to use this published data set, since I knew it was good, clean data.

Biuletyn-001Thanks to the cooperation with the John Paul II Catholic University of Lublin we present, in digital form, the Pilsudski Institute Bulletins and Reports, beginning with the first issue published in 1943. The Bulletins and Reports were published since its inception in 1943; they reflect the events, daily operations, donations, publications and growing collections of the Institute.  Most have an English language section or summary and in some cases a separate English version was published

The issues already digitized can be downloaded  from our webpage. Bulletins from year 1943 to 1978,  partly from 1980 to 1988 and all from year 2006 to 2014 are available online. Additional volumes will be added successively.

1-nycdh-smNa niedawnej konferencji METRO (Metropolitan New York Library Council) miała miejsce prezentacja przedstawicieli grupy ‘Humanistyka Cyfrowa w New York City' (NYCDH). Grupa ta działa od połowy 2011, i zrzesza zainteresowanych Humanistyką Cyfrową z Nowego Jorku i okolic. Dostarcza ona forum wielu różnym organizacjom i małym grupom osób które pracują nad jakimiś problemami związanymi z humanistyką cyfrową. Uczelnie, w których pracują członkowie komisji sterującej grupy (takie jak NYU, CUNY, Columbia, Pratt i inne) udzielają miejsca na spotkania. Kalendarz grupy jest pełny, często jest kilka wydarzeń lub spotkań w tygodniu. Grupa jest otwarta, i po zarejestrowaniu się każdy członek może wpisać w kalendarz imprezę jaka organizuje i wziąć udział w już ogłoszonej.

Na stronie NYCDH można znaleźć grupy dyskusyjne o wielu tematach takich jak “Pedagogika Cyfrowa”, “Grupa OMEKA”, “Bibliotekarze w Humanistyce Cyfrowej”, “Grupa analizy tekstu”, “Grupa eksperymentów cyfrowych”, “Antyki i techniki cyfrowe” i inne. Planowane na najbliższy okres i niedawno zakończone imprezy dobrze obrazują działalność grupy.

The 2015 Annual Conference of the Metropolitan New York Library Council

“What is the Problem” by Jill CirasellaThe slide entitled “What is the Problem” by Jill Cirasella, Associate Librarian for Public Services and Scholarly Communication at The Graduate Center, CUNY, published using the CC-BY licence

On Thursday, Jan 15, 2015 we took part in the Annual Conference of the Metropolitan New York Library Council, popularly called the METRO. The annual conference is always worth attending, since METRO associates individuals and organizations very advanced in modern thinking about libraries, archives and museums. This year was no exception, and we had a chance to learn about new achievements and ideas including the latest in digital technologies.

The Keynote speaker was Professor Siva Vaidhayathan from the University of Virginia, excellent speaker and author of many books including “Copyrights and Copywrongs: The rise of if Intellectual Property and How it Threatens the Creativity” and “The Googlization of Everything and Why We Should Worry”. He presented a slightly apocalyptic image of the present and his predictions for the next 10 years, where the “Internet” disappears, being replaced by embedded systems, walled gardens and products of visionaries that do not necessarily agree with our vision. The title of the presentation was “The Operating System of Your Life, and he touched topics like copyright, massive surveillance by companies and governments, security meltdowns, network neutrality and others. His message to the libraries and librarians was to take over and start framing the debates about the future. Most of us left the room awestruck, if not necessarily completely convinced.

Following the keynote we have split to cover as many topics as possible in the parallel sessions.