Digital Humanities

Blog archiwistów i bibliotekarzy Instytutu Piłsudskiego

Blog archiwistów i bibliotekarzy Instytutu Piłsudskiego

: Marek Zielinski

Linked Data part 2: Where Is the Data?

linked-data280 A fragment of a Linked Data Graph from LinkedData.org

Linked Data is a relatively new phenomenon in the World Wide Web, providing access to structured data. What is structured data? World Wide Web is now a universal vehicle for human-readable information - all websites, articles, apps give us information that we can read and interpret, for example an answer to the question “when is the next bus coming to this bus stop?” Such information is not easy for a computer to read - it does not know what “this stop” means, whether you are waiting a specific line or any bus, etc. Computers require information with a structure, which for example can take form of label:value pairs (“bus stop number:4398, bus line:Q11, distance from the stop:2.5 miles, etc.).

Information is commonly stored in databases, which have evolved to be very efficient in data storage and retrieval, but terrible in information sharing. Each database has lots of columns, each named differently and only the local computer system knows how to retrieve the data. This is where the new concept, Linked Data, comes to the rescue. Linked Data is an system that makes computers understand each other by labeling databases with metadata. Its metadata scheme, RDF (resource description framework), requires that data comes not in provincial tables, but in universally readable RDF sentences, consisting of subject, predicate and object. Instead of invented column names we use standard names arranged in ontologies, and instead of a textual description of the subject of the RDF sentence we use its identifier, URI (Universal Resource Identifier). Thus, instead of the trivial for the human reader information about the title of this blog (after all we can read it above, right?) we get a structured sentence or “triple” in RDF lingo, [http://www.pilsudski.org/en/news/blog/832 - dc:title - “Linked Data part 2: Where Is the Data?”]. The first part is the URI or unique “address” of this article, the second means “title” in a specific metadata standard (Dublin Core), and the third part is the actual title.

: Marek Zielinski

Hakerzy i archiwiści z NASA przywracają do świetności zagubione zdjęcia księżyca.

wschod-ziemi-280 Wschód ziemi. W dolnej części odzyskane zdjęcie wysokiej jakości.

NASA opublikowało niedawno nowo odzyskane zdjęcia z sond księżycowych, wysyłanych w latach 1966-67 w ramach programu "Lunar Orbiter". Różnica jakości pomiędzy starymi, opublikowanymi zdjęciami i nowym materiałem jest uderzająca. Historia uratowania materiału i odtworzenia wysokiej jakości obrazów jest pouczająca, a zaczyna się od roku 1986, kiedy to archiwistka Jet Propulsion Laboratory (JPL) Nancy Evans zdecydowała, że nie może, w dobrej wierze, zwyczajnie wyrzucić starego materiału.

Sondy wyposażone były kamery wysokiej jakości, z podwójnymi obiektywami, i wykonywały duże ilości zdjęć na taśmie 70 mm. Taśmy były potem wywoływane na pokładzie sondy, zdjęcia były skanowane i wysyłane na ziemię. Modulowany sygnał z sondy, był zapisywany na taśmę magnetyczną, wraz z komentarzami operatorów . Następnie cała sonda (z oryginałami zdjęć) była bezceremonialnie rozbijana o powierzchnię księżyca. Taśmy magnetyczne były wykorzystane do wydrukowania dużych obrazów na papierze (wynajmowano stare kościoły aby rozwiesić ogromne arkusze), które używano do zidentyfikowania potencjalnych miejsc lądowania na księżycu. Następnie taśmy były załadowane do pudeł i zapomniane.

W 2005 dwaj entuzjaści z NASA, Keith Cowing i Dennis Wingo rozpoczęli prace nad odtworzeniem taśm, które w międzyczasie zmieniały kilkakrotnie miejsce przechowywania. Napęd taśm, bardzo rzadko spotykany Ampex FR-90, został zlokalizowany w szopie Nancy Evans, i grupa rozpoczęła pracę nad odzyskaniem obrazów. Wymagało to odbudowania napędu, odtworzenia nieistniejących już części i elektroniki, konwersji zmodulowanego sygnału na zapis cyfrowy, a następnie cierpliwego poskładania fragmentów zdjęć w jedno. Po odzyskaniu pierwszego zdjęcia ("Wschód ziemi", patrz wyżej), zespól, pracujący do tej pory ochotniczo, uzyskał finansowanie z NASA na kontynuowanie projektu. Od 2007 udało się odzyskać ok. 2000 zdjęć księżyca, ze zdumiewającymi szczegółami.

: Marek Zielinski

Why Digital?

Fragment of the the eleventh tablet of the Gilgamesh Epic. Based on derivative work: Frédéric GilgameshTablet.jpg: Babylonian [Public domai], via Wikimedia Commons

Why bother scanning and digitizing documents and books? What is the justification of such enormous effort to convert the cultural legacy of humans to digital form? I often hear such questions - from historians who prefer the “smell and touch” of the original documents or archivists who claim that microfilms are good enough. Is digital technology just a the fashion that will soon pass, or does it have a deeper significance?

If you think that there is something powerful in the digital, read on. We will explore why digital is important - for archives, libraries, museums (GLAM) and all the producers and consumers of cultural goods. In the next three sections we will discuss the three reasons for switching (or, perhaps, returning) to digital information processing: Preservation, Discoverability, Access and review the oldest discrete information systems known to us.

Preservation

Digital is only one of many implementations of discrete information storage and processing. Most of the signals that reach our senses, be it a rainbow, a symphony or the smell of a rose, can be considered analog. Analog means that the signal can take any value, for example a tone in music or a color in visible spectrum. The range of possibilities is typically limited only by capabilities of our senses - we cannot see the infrared, nor hear the ultrasound. Once the signal enters our eye (or a digital camera), it does not retain the continuity of the original signal. In the eye retina the sensors - rods and cones - act on the ‘all or nothing’ principle, in the camera the pixel sensors decompose the light into a small number of levels. In the discrete system only a limited, countable number of states is allowed, with nothing in between. All modern digital computers use the basic information unit of a binary bit, which can only have two states (commonly called 0 and 1). The mathematical theory of information, first proposed by Claude Shannon, also uses the binary bit with two possible states, implying that information is in its nature discrete. In computers, single bits are typically strung together: a group of 8 bits in sequence is called byte. In order to keep the generality of the discussion, we will call the smallest unit a character, and a string of characters a word. The digital computers thus operate with 2-state character and 8 character word.

Do you GLAM?

GLAM_logo_transparent

GLAM is an acronym that stands for: Galleries, Libraries, Archives and Museums. The designation applies to institutions that have something in common - they are repositories of human cultural heritage.

Although there are institutions that bring together museums, archives, libraries, etc., providing them with financial or logistical support -- such as the Institute of Museum and Library Services (ILMS) in Washington, DC, Museums, Libraries and Archives Council in the UK and the Norwegian Archive, Library and Museum Authority -- these institutions do not claim to belong to GLAM nor do they use this acronym.

What, therefore, is GLAM? It is the idea that the institutions whose mission it is to collect cultural treasures will benefit from making these resources widely available. The idea of GLAM can be best illustrated by two initiatives, OpenGLAM and GLAM-Wiki.

Copyright Week

copyright-square-1 Recently (13-18) the Electronic Frontier Foundation (EFF) organized the Copyright Week to remind us what can we do to keep and exercise our rights under the law, and how to work on improving it. For each of the (daily) topics, the participating institutions contributed articles, blogs and other material. It is a fascinating lecture, well worth reading. Below is a short review illustrated with excerpts from selected texts.

Transparency

After the public outcry over ACTA and SOPA, the US Congress tries again to keep the new proposed law secret. This time it is the TPP, which, if enacted, could bury the new copyright provisions in secret treaty negotiations. As before, the legislators fear public scrutiny, while consulting with the industry.

“The leaked ‘Intellectual Property’ chapter of the Trans-Pacific Partnership agreement confirmed our worst fears: Big Content companies are pushing extreme copyright provisions in a secret trade deal that would put restrictive controls on the Internet. While Hollywood has had easy access to view and comment on draft texts—so it can get the provisions it wants—our own lawmakers have been mostly left out”. - EFF

METRO Annual Conference 2014

metro260

On Wednesday, January 15, 2014 the Annual Conference of the Metropolitan New York Library Council (METRO) was held in New York. The conference, which took place in a modern Vertial Campus at Baruch College (CUNY), brought together more than two hundred representatives of libraries, archives, colleges and other institutions from New York City and surrounding areas. Participants had a choice of 25 presentations and lectures showing various aspects of work, opportunities and achievements of the broadly understood library community. The Pilsudski Institute presented a lecture by Marek Zielinski and Iwona Korg on Digitization of Polish History 1918-1923 describing the digitization project and showing selected archival sources, digitization techniques, special projects and online presentation and statistics.

The conference began with a keynote lecture delivered by a known librarian and blogger Jessamyn West. In the presentation Open, Now! she told us about the possibilities of Open Access with unrestricted, free access to the wide range of sources through the Internet. She talked about Google, Digital Public Library of America, Open LIbrary and legal issues associated with such access.

Next, the participants could choose the lectures from a wide selection of topics. Here are some notes on of those we attended. The program of the conference is available online, the links below lead to the slideshows from the presentations:

Building Authorities with Crowdsourced and Linked Open Data in ProMusicDB (Kimmy Szeto, Baruch College, CUNY and Christy Crowl, ProMusicDB). Linked Data is a great concept, but where to find authoritative data? The authors presented their search for sources of information on music performers and their roles. They found data in many diverse places in the Internet. In the talk they presented the information sources and ways of reconciling the data to obtain a consistent and usable dataset.

Metadata for Oral History: Acting Locally, Thinking Globally (Natalie Milbrodt, Jane Jacobs and Dacia Metes, Queens Library). The representatives of the Queens Public Library presented the latest project on the history of the borough of Queens, including lectures, pictures and memorabilia of the oldest inhabitants of this district. Emphasized was the difficulty of choosing the software and the metadata model, especially in terms of geographical names. Very useful was the pointers on how to describe a personal interview (Who, What, When?, Where, Why?, How?).

Mug Shots, Rap Sheets, & Oral Histories: Building the New Digital Collections at John Jay College of Criminal Justice (Robin Davis, John Jay College of Criminal Justice). The representative of an academic library outlined the stages of work, metadata, and showed some of the most interesting documents from a forthcoming web exhibit on the history of the NYPD (New York Police Department) and recordings of interviews with New York City Mayor Ed Koch.

Wikipedia in Cultural Heritage Institutions (Dorothy Howard, Metropolitan New York Library Council). Dorothy Howard is currently the METRO Wkipedian-in-residence. She presented the latest projects such as Wikipedia GlamWiki, Wikipedia Commons, and told us about the activities of Wikipedians to raise the level and quality of articles, especially regarding medical issues.

Beyond digitization: hacking structured data out of historical documents (Dave Riordan and Paul Beaudoin, NYPL Labs, The New York Public Library). The programmers from the New York Public Library embarked on an ambitious, crowdsourcing project of extracting metadata from vast library collections. They have built tools to entice volunteers to transcribe documents, help them in the task, verify their work by reconciling results of different people, and more, in the process improving and repurposing an Open Source software (Scribe). The topics described in the presentation range from restaurant menus to playbills.

Open Access is a Lot of Work!: How I Took a Journal Open Access and Lived to Tell About It (Emily Drabinski, LIU Brooklyn). A very interesting presentation describing the work with the journal “Radical Teacher” and a change from the traditional paper publishing system to. Open Access, enabled in collaboration with the University of Pittsburg. She stressed that such type of transformation requires a huge amount of work.

The METRO team organized the conference very professionally and took care of every single detail. Thanks and see you next year!

Iwona Korga, January 21, 2014

: Marek Zielinski

World Digitization Projects

CodexSinaiaticus260 - Codex Sinaiaticus, Esther 4:17m - 5:2 - book 9 chapter 5

The answers to the question “What is digitization ?” are as diverse as there are resources converted into electronic form, and institutions that undertake such task. Some deal with only a single document, others describe in great detail an event or the work of one individual, others yet provide access to virtual archives of history. There are projects that demonstrate innovative technical solutions by combining different techniques and sources of information to provide new ways of viewing and searching. The institutions that have rich collections of their own develop exhibitions of selected resources, while other projects are based on the cooperation of many institutions. Here are some examples to illustrate this diversity:

Codex Sinaiaticus created in the mid-fourth century, contains the text of the Bible in Greek, including the oldest complete copy of the New Testament. Until the mid 19th century the manuscript was kept in the Monastery of Saint Catherine, the oldest existing Christian monastery, situated at the foot of Mount Sinai in Egypt. Today, parts of this manuscript are located in four institutions: The British Library in London, the Library of the University of Leipzig, the Russian National Library in St. Petersburg as well as the Monastery of St. Catherine. A website was created as a result of the cooperation of these four institutions. It is very carefully designed and contains all of the fragments of the codex. In addition to scans of the original pages, a transcript of the Greek is provided, which, for some pages, is also translated into other languages (English, German and/or Russian). Crosslinks allow one to locate transcribed fragments of text after selecting them in the original.