How to write dates part 2: EDTF

Prague astronomical clock
Prague astronomical clock By Steve Collis from Melbourne, Australia (Astronomical Clock Uploaded by russavia) [CC BY 2.0], via Wikimedia Commons

In one of my previous blog posts on “How to write dates?” I discussed the basic universal date and time notation, as specified in the International Organization for Standardization standard (ISO 8601) and its Word Wide Web Consortium (W3C) simplification. Since that time the Library of Congress has completed the work on the extension of this standard, the Extended Date/Time Format (EDTF) 1.0. This extension for the most part deals with expressing uncertain dates and times. Such limited or imprecise date/time information is common occurrence in recording historical events in archives libraries etc. The ISO 8601 does not allow for the expression of such concepts as “approximately  year 1962” or “some year between 1920 and 1935” or “the event occurred probably in may 1938, but we are not certain”. The EDTF standard, allows us to express them in a formalized way,, fulfilling a real need in many fields dealing with historical metadata.

Despite the fact that the standard is relatively new, and there are few software tools to help enter or validate the uncertain dates and time data, I believe, that it is worth familiarizing oneself with the new notation wherever possible.

Definitions

I would like to to begin with some definitions to facilitate the discussion of the new notation. The definitions are accompanied by symbols that will be used in the next section. 

Precision

Precision is a measure of a range or interval within which the ‘true’ value exists [1]. Precision is explicit in the date or date/time expression; if an event occurred in the year 1318, the precision is one year (it could occur at any time within this year). If we specify 1945-09-15, the precision is one day, etc. [2] In EDTF we can extend this definition to a specify a decade or century precision using the x symbol - see discussion of masked precision below.

Approximate (~)

An estimate that is assumed to be possibly correct, or close to correct, where “closeness” may be dependent on specific application.

Uncertain (?)

We are not sure of the value of the variable (in our case date or time). Uncertainty is independent of precision. The source of the information may itself not be reliable, or we may face several values and not enough information to discern between them. For example we may be uncertain as to the year, or month, or day of an event etc.

Unspecified (u)

The value is not stated. The point in time may be unspecified because it did not occur yet, because it is classified, unknown or for any other reason.

Features of the EDTF

The EDTF extends the ISO 8601 in three levels: 0, 1 and 2. Level 0 is compliant with ISO 8601 and the W3C limits. Level 0 is described in more detail in my previous blog post (except for the duration, which is not mentioned in EDTF). Level 1 and 2 extend the standard to include additional features, allowing us to specify precision, approximate, uncertain and unspecified dates and times in different combinations. Level 2 provides a greater power of expression than Level 1, and I see no reason for the programmers not implement both levels at once.

Season

One can replace the month in the year-month string by  symbols 21 (Spring) 22 (Summer) 23 (Fall) and 24 (Winter) to indicate season.

  • 2014-21 (Spring of 2014)

In Level 2 one can additionally qualify the season using the ^ symbol, as in the following example: 2014-21^southernHemisphere. However, the dictionary of qualifiers is not specified in the standard.

Uncertain and Approximate Date

Symbol ? is used to indicate an uncertain date and ~ to indicate the approximate date. They can be used singly or in combination (e.g. “the date is approximate, and even that is uncertain”).

In Level 1 the symbol(s) can be placed only at the end of the date string, and apply to the whole date:

  • 1945? (year uncertain)
  • 1945-03-12~ (approximate date)
  • 1945-03?~ (year-month approximate and uncertain)

In Level 2 any part of the date (but only whole year, month or day of the month) can be marked as uncertain or approximate, in any combination. The symbol applies to the elements on its left, and one can use parenthesis to separate a year, month or day and apply the symbol:

  • 1816?-05-25 (day and month known, year uncertain)
  • 1816-05~-25 (day known, year and month approximate)
  • 1816-(05)?-25 (only month uncertain, year and day known)
  • 1816-(05-25)? (year known, month and day uncertain)
  • 1816?-05-25~ (month known, year uncertain, day approximate
  • (1816-(06)~)? or 1816?-(06)?~ (year uncertain, month both uncertain and approximate)
  • 1816-22~ (approximate season “around Summer 1816”)

Unspecified date

Letter u can be substituted for that digit in a date, which is unspecified.

In Level 1 only rightmost digits can be replaced. Another limitation is that only 1 or 2 digits in the year and all two digits in a month and day can be replaced.

  • 191u (unspecified year in the 1910s)
  • 19uu (some year in the 1900s)
  • 1915-uu (some month in 1915)
  • 1915-03-uu (some day in March 1915)
  • 1915-uu-uu (some unspecified day in 1915)

In Level 2 the limitations are removed, and u can be substituted for any digit in the date. As always, we use the number of segments (year, year-month, year-month-day) to indicate precision.

  • 13uu-01-15 (January 15 in the 1300s)
  • 13uu-01-uu (some day in January of some year in the 1300s)
  • 13uu-01  (January of some year in the 1300s)

Extended interval

In the interval, unknown can be used instead starting or ending date and open can replace the ending date of the interval. Additionally, the uncertain, approximate and unspecified (the last one only in Level 2) modifiers can be used.

In Level 1, one can use unknown and open; the ~ and ? symbols can be used only at the end of both dates.

  • 145-02-11/unknown (interval beginning February 2, 1945, end unknown)
  • 2015-01-20/open (interval beginning January 20, 2015, no end date)
  • 1825~/1918-05 (interval beginning approximately in 1825 and ending in May 1918)

In Level 2, one can use the ~ , ? and the u symbols in any place of the date.

  • 2012-(06)?-01/2015-06-nn (an interval beginning on the first of June 2012, where the month in uncertain, and ending at unspecified date of June 2015)

Year exceeding four digits

If the year exceeds four digits, and only then, the letter y should be placed before the year (no more precise dates are allowed). In Level 2 additional exponential representation is allowed (with optional precision marked with p followed by number of significant digits, e.g. p3)

Level 1

  • y-1700015 (year - 1700015)

Level 2

  • y17e5 (Year 1700000)

-----------

In Level 2 only, there are 3 additional features: masked precision and two two date lists, all very useful features.

Masked precision

In Level 0 we can express the precision of a date by specifying year-month-day (day precision), year-month (month precision) and year (year precision). In Level 2 one can replace last one or two digits of the year with the letter x to indicate still lower precision covering 10 or 100 years:

  • 198x (the 1980’s)
  • 19xx (the 1900’s)

Note, that ‘the 1900’s’ is similar, but not identical to “20 century” as the 20 century starts in 1901 and ends in 2000, while the 1900’s start in 1900 and end in 1999). Also, 19xx covers a period of 100 years, while 19uu denotes one year only (unknown but within the 1900s)

One of a set

Square brackets  [ and ] wrap a single-choice list of dates. Only one date is the result of the choice, not many dates nor the interval. Within the brackets, dates are separated by commas, or double dots indicating all dates between the dates it separates, inclusive. Different list elements may have different precision.

  • [1821,1822,1830..1832]  (one of the years 1821, 1822, 1830, 1831, 1832)
  • [..1935-11-15] (November 15, 1935 or some earlier date)
  • [1510-12..] (December 1510 or some later month)
  • [1725,1726-12] (either the year 1725 or December 1925)

Multiple dates

Curly braces { and } wrap an inclusive list (all members included). For consecutive dates it means a discrete set, not the interval. (For example we may be describing the event that recurred in at several different dates)

  • {1970-12..1972-12,1973-11} (an event was repeated in December 1970, December 1971, December 1972 and November 1973)
  • {1950,1951-05} (the year of 1950 and May 1951)

How to use EDTF

There are not many implementations of EDTF as yet. The Library of Congress has a web page describing some software tools applicable to EDTF. A validating service is very useful in testing your experiments with EDTF: you can paste the date/time string in the window provided and check compliance with the standard.

Entering the EDTF formatted date in the web forms may always possible, if the form enforces a ‘standard date’ or provides a pop-up calendar. The EDTF is, however definitely much better than textual representation of uncertain or extended dates, because the free-form  text can rarely be decoded by computer. Perhaps it is worth asking developers for such entry fields, following examples of those institutions that have already implemented it. (See Read more below). At the very least, EDTF should be used instead of free-text description of dates.

Marek Zieliński, July 1, 2015

Footnotes

[1] Accuracy and precision - Wikipedia article
[2] Precision is different from accuracy, which defines how sure we are of the fact, that the event did occur in a specific year, for example.

Read more

Explore more blog posts:

PARTNERZY
mkidn
bn
senat
ndap
msz
dn
psfcu
nyc