The clock in the Santiago de Compostela Cathedral. |

Nothing illustrates better the difficulties of computer processing of natural language as the notation of units of time. The calendar we use in the US originates in year 46 BC (introduced by Julius Caesar) and amended in 1582 (by Pope Gregory VIII). It is the most popular, but not the only calendar, there are other widely used such as as Hijri, Hindu, Chinese or Hebrew calendars. In the Gregorian calendar we specify day, month and year, and it seems straightforward that computers should always understand those 'basic data'. Unfortunately, this simplicity is only apparent.

The first difficulty is the notation of date. In Polish we traditionally write 3 IV 1923 or perhaps 3.4.23. In English we would write 4/3/23, which could mean 3 April 1923 or 4 March 2023, depending on the country and the default century. Until recently, computers could not even deal with such simple conversions, today there exists some Artificial Intelligence (AI) which does attempt to guess the language, and sometimes (but not always) manages to guess the correct date.

Notation of time is also problematic. In the US day is divided into morning hours marked AM and afternoon hours (PM). Except that no one wants to make an appointment on Monday at 12 AM or Tuesday at 12 PM, because those notations are ambiguous [1]. The 24-hour clock removes at least this ambiguity (24:00 on Monday and 00:00 Tuesday is the same moment in time, but belonging to different days).

Another problem involves recording dates with specified precision. Anyone who has worked with spreadsheets (even the latest), knows that one can write the day (with or without specifying the time), but not time without specifying the day. We cannot record the year without specifying the month and day, etc. We can of course write it out in words, which is fine for humans and completely throws off the computer.

Many developers struggled with such difficulties, creating their own rules for writing dates. There are many such standards, mostly incompatible. It was not until the introduction of the ISO standard and its adoption by the W3C that a uniform date and time notation was created. Below are some basic elements of this notation, based on the W3C Schema standard that limits a little the range of possibilities written into ISO 8601.

**Date**

- The date is written as YYYY-MM-DD, for example
**1867-12-05** - We can skip the day e.g.
**1867-12**, or the day and month, for example**1867**. - We can specify year and week week in the year:
**1945W06**(this notation has a one week precision)

**Time of day**

- Time of day is written uniformly as hh:mm:ss, for example.
**13:25:17** - You can leave out seconds e.g.
**13:25**or minutes and seconds, e.g.**13** - We can add a fraction of a second with any precision, e..g
**13:25:17.50255** - If we specify day and time, we separate them with a capital T, for example.
**1867-12-05T18:45** - We can also specify the time zone (without it the time it is local or without a defined zone). The time, such as
**13:35:17Z**is the UTC time (formerly Greenwich), while**13:25:17-05:00**or**2013-06-05-05:00**represent the time or day in the time zone of New York City.

**Repeating Time Periods**

The W3C Schema defines the following recurring periods of time: day, day-month and month.

- The day that is repeated each month (and lasts one day), for example.
**---15**is the fifteenth of each month. - Day-month repeats only once a year, for example.
**--12-24**is 24 December. - Month repeats once each year (and lasts the entire month), for example.
**--08**is August each year.

**Time Period**

The time period can be have a specific start and end time, or be defined only by its duration. For the time period, we write the start and end date or date-time, separated by the slash sign** /**, for example

**2000-01-01/2000-12-31**is a duration of the whole of year 2000 (as opposed to “2000”, which may refer to an event that occurred some time in 2000)

**Duration**

The duration of the event can be specified with any precision. We start with capital P and then add duration in years (Y), months (M), days (D), hours (H), minutes (M) and seconds (S). Examples:

**P12Y5M**(12 years and 5 months)**P36M**(36 months)**P20DT15H30M**(20 days, 15 hours and 30 minutes (symbol T needed to separate the date and the time)**P1356S**(1345 seconds)**P99.486S**(99.486 seconds) and the like.

Using ISO 8601 (preferably in its simplified W3C version) is recommended, especially where data accuracy is important. This standard is, however, only a beginning. Those that work with real data, chiefly historical - in archives, genealogy, etc. know that there are many cases where the date is uncertain, that there is a need to distinguish an event that occurred somewhere between 1832 and 1840 from the period 1832 to 1840, write down a period that started but did not finish yet, or express dates like Spring of 1920 or the second half of the nineties. The Library of Congress is working on a new standard that is meant to allow for the expression of such data, the EDTF (Extended Date / Time format) standard. In the next part of this blog I will try to review the main ideas of EDTF.

The problems with dates are numerous. There are calendars, in which dates cannot be unambiguously translated into different calendar dates. Another example is the zero year; Romans did not know the zero, (they used “Roman” numerals) so there was no zero year. Zero has been known for 976 years - invented by the Persian mathematician Muhammad ibn Ahmad al-Khwarizmi, and after a thousand years we slowly begin to use zero in counting of years. All the standards mentioned above, including ISO 8601 and EDTF use Astronomical Year Numbering, which includes year 0 and negative values for the years BC/BCE. As you can see mathematical innovations are introduced into general use rather slowly. Since, however, computerization is changing the world much faster, using the standard notation described above is highly recommended, as it increases the chances that the time data, so patiently recorded by us, will be correctly interpreted by the still not very bright computers.

[1] “Question: Are noon and midnight referred to as 12 a.m. or 12 p.m.? Answer: This is a tricky question because 12 a.m. and 12 p.m. are ambiguous and should not be used. “* from the website of the National Institute of Standards and Technology.*

#### More about dates

- Standard ISO 8601 - the Wikipedia article
- All about calendars - the “Calendar Zone”
- The 12 - hour clock - Wikipedia article

Marek Zieliński, June 1, 2013, updated June 20, 2015

**Explore more blog posts:**