Metadata Standards: EAD

ead

EAD (Encoded Archival Description) is a standard created expressly for encoding archival  finding aids. For this reason, it is a hybrid. On the one hand, is trying to reflect the way in which archivists works, creating finding aids, on the other it is trying to introduce discipline and accuracy necessary for electronic document processing. The result is a lot of flexibility in the placement of data, which facilitates the work of an archivist at the same time makes it rather difficult to exchange data . The new version of the EAD (EAD#), which is in preparation for several years may hopefully reduce much of such arbitrariness.

The rules and principles of creating finding aids are contained in separate documents. In addition to the international standard - ISAD (G) - are the rules established in different countries,  such as DACS in the U.S., that are similar but often have subtle differences. EAD is an encoding using such rules, understandable by humans but also suitable for computer processing . Like all modern standards of metadata it is expressed in XML and consists of a series of nested labels like <ead>, along with the rules of nesting and the rules governing their their content.

<ead>

The basic document in EAD is a unit containing typically one collection (fonds), in form of a file using XML as a framework. The document must have the following structure:

<ead>
[content]
</ead>

In the following we will discuss the items replacing the [content] placeholder above.

The EAD standard is rather simple in structure, although it has a lot of detail. The ead document must contain two elements (and their contents), <eadheader> and <archdesc>.

<ead>
<eadheader>[content]</eadheader>
<archdesc>[content]</archdesc>
</ead>

As a reminder, the XML elements are marked with angle brackets < and > and the content of the element, eg <ead> is all that is between <ead> and </ ead>.

<eadheader>

The <eadheader> element is intended to hold information on the archival finding aid - in contrast to the actual archival resource - title, author and other details, as well as information about the organization that publishes it. It is a bit redundant, as the whole document is the finding aid, but it allows one to save data such as the finding aid author as opposed to the author of the archival collection. This field has also a place for a unique identifier for the archive in the <eadid> item.

The <eadheader> field, in the intention of creators of this standard, should contain suffcient information to generate the title page of a printed volume of the finding aid. If it fails to do so, one can use an additional field called <frontmatter> .

The <archdesc> element  is dedicated to all the information about the described archival resource. Ead does not put restrictions on the resource, it can be either the whole fonds or its part. However, since the EAD structure is hierarchical, it is best to enclose in this field one fonds, with its subelements (sub-fonds, series, subseries etc.) The <archdesc> element has an attribute, "level" and for the entire fonds this field looks like this:

<archdesc level="fonds"> [content] </archdesc>

The  <archdesc> field is a wrapper which contains information in separate XML elements. Other elements which serve only as wrappers are <did> (descriptive identification), <dsc> (description of subordinate components) and <c> (component). The majority of elements and attributes, however, is used to store information about a resource.

<archdesc>

What is inside the <archdesc>? The first element we encounter is <did> or Descriptive IDentification of the resource. It contains selected meatadata describing the resource as a wole, for example the abstact, but not the history of its creator. The main subcomponents in <did> are:

<unittitle> - the title of the resource
<unitid> - resource identifier
<unitdate> - the date of the resource - a date range.
<langmaterial> - information about the language used in this resource.
<abstract> - abstract or description of the resource.
<physdesc> - detailed information about the physical aspects of the resource

Other, equally important elements of <archdesc> are not placed in the wrapper <did>, but loose insde the higher level wrapper. They are, for example :

<bioghist> - Biography or History of the creator of the material.
<scopecontent> - Scope and Content - detailed description of the content, in the form of a narrative, by definition wider than a summary
<index> - wrapper containing various index entries
<controlaccess> - Control Acess headings. A wrapper for selected important controlled vocabulary terms - subjest and other headings.
<dao> (digital archival object) - A wrapper for digital objects

Finally we reach the inventory of the contents of the resource. Because the archive is organized in a hierarchical manner, we have sub-elements, their sub-sub-elementsand so on. We use here two wrappers:

<dsc> - Description of Subordinate Components - a wrapper containing the descriptions of subordinate components i <archdesc> only.
<c> - wrapper containing a description of one of the child component (which may contain further child elements inside).

For reading clarity (but not needed for machine processing) these levels can also be numbered , eg <c01> , <c02> etc.

Thus, the second element of <ead>, the <archdesc>, will look at a glance like this:

  • <archdesc>
    • <did>
      • <unittitle>...</unittitle>
      • <unitid>...</unitid>
      • <unitdate>...</unitdate>
      • <langmaterial>...</langmaterial>
      • <abstract>...</abstract>
      • <physdesc>...</physdesc>
      • ...
    • </did>
    • <bioghist>...</bioghist>
    • <index>...</index>
    • <controlaccess>...</controlaccess>
    • <dao>...</dao>
    • ...
    • <dsc>...</dsc>
  • </archdesc>

Ellipses are the placeholders for content and/or other elements. The content has often its own structure, with its own XML tags.

<dsc>

<dsc> is a wrapper which contains the children components. If the fonds is divided into series, the <dsc> element will contain the series, each in its separate <c> element. The <dsc> is only used in the <archdesc>  - the highest level element of <ead>. Each child element (component ) is placed in its own <c> jacket , for example, for a fonds that has three series will have

  • <dsc>
    • <c level=”series”>...</c>
    • <c level=”series”>...</c>
    • <c level=”series”>...</c>
  • </dsc

Optionally, instead of <c> we can use <c01>.

<c>

<c> is a wrapper element that contains the data of the resource (subordinate to the one in higher level of the hierarchy). How to describe the child ? We have here a wonderful example of recursion, a mathematical concept. Inside the <c> we can use all the items we used in higher level element - just look at <archdesc> above to find out what fields are available. At a minimum, we should use a label for a the title, <unittitle>, but one can put in all the details of this component. The only difference compared to <archdesc> is that the we do not use <dsc> to group the subordinate element, but throw them in ‘loose’.

  • <c level=”series”>
    • <did>
      • <unittitle>...</unittitle>
      • <unitid>...</unitid>
      • <unitdate>...</unitdate>
      • <langmaterial>...</langmaterial>
      • ...
    • </did>
    • <bioghist>...</bioghist>
    • <index>...</index>
    • <controlaccess>...</controlaccess>
    • <dao>...</dao>
    • <c level=”file”> …. </c>
    • <c level=”file”> …. </c>
    • <c level=”file”> …. </c>
  • </c>

Because a lot of information about a resource is already included in the higher level description, we need not repeat them here, limiting ourselves to the specific information related to that element.

How to use EAD?

There are two schools of using EAD for archival descriptions. One way is to work directly with the code stored in XML. Contrary to the first appearance, it is not very complicated, and there are many good XML editors to assist one in adding, copying, or modifying elements and attributes. This method gives one the greatest flexibility and the possibility of creating any (arbitrary) form of the EAD document. It is particularly popular in the institutions to make use of EAD before created a good tool. The other school uses increasingly sophisticated database tools, such as Archivists’ Toolkit, Archives Space, or ICA-AtoM. Each of them uses a specific variant of EAD, with limitations imposed, but also with much better compatibility.

Standard EAD is very popular and a lot of important archives use it to stora metadata. On the Internet you can find many publications and guides which can and should be consulted to learn more about the standard.

Read more

Marek Zieliński, September 28, 2013

PARTNERZY
Ministerstwo Kultury
Biblioteka Narodowa
Naczelna Dyrekcja Archiwów Państwowych
Konsulat RP w NY
Fundacja na rzecz Dziedzictwa Narodowego
PSFCU
NYC Department of Cultural Affairs