Envelopes for digital images

GraphicFormatsThese days we use digital photography more and more often. What was a novelty only a dozen or so years ago has become the norm, and traditional cameras are becoming rare. We can see the image instantly, all the devices we carry (phones, tablets etc.) have photo capability, and memory and cameras are constantly become cheaper -  all this resulting in the creation of more and more photos. At the same time, photography has become something very transient. In the past you would paste the photos into albums or collect them in boxes, while now they exist as files on a computer disk, and at the first disk failure we suddenly lose our treasured resources. Personal digital archiving is a broad subject; this time let us focus on packing the images in digital envelopes called files.

An image is not just a photo. Scans of documents in an archive (personal or institutional) is a digital record that should faithfully reflect the original document. How do we choose the best way of preserving images for the next generation, for our grandchildren to be able to enjoy grandparents’ photo albums and for archives to preserve invaluable (because the paper did not survive) archival images? Saved images are stored on a computer disk in a container called a file. We will talk about the format of these envelopes, compression and metadata as well as translating an image from one format to another (conversion).

A digital camera is an imitation of the retina of the eye. The imitation is not very good, because the eye works differently than the camera, but we can treat it as an approximation. The picture - collected by a lens or scanned on the flatbed scanner - is divided into small sections, usually square (pixels), and the color is stored separately for each square. The data for the three colors (different than in the human eye) are recorded. As a result we obtain a rectangular matrix, each cell containing color data. The image is characterized by dimensions in pixels (height and width) and the third dimension, the depth of the color. The most popular model uses 8 bits for each of the three primary colors (total 24 bits), which provides the ability to store more than 16 million color tones. The saved data files are packed in one of the formats known as raster formats.

Format selection criteria

Until recently, we did not need any tools (except for glasses - occasionally) to look at paintings, photos, or to read a book. Today, more and more often we have to use equipment (a computer or device that performs the same function under various names - a phone, tablet, etc.). What's worse, we find a large number of formats, better or worse suited to our requirements. What are these requirements?

  1. The image format should be public, not closed. Some formats, particularly older ones, were created by companies dealing with image processing, and they retain copyright. Usually the format is published and publicly available. Formats defined as international standards (e.g. ISO) are much more likely to remain useful in the future.
  2. he format should be popular (which can sometimes conflict with requirement 1). A standard that has no readily available tools can be used only in theory.
  3. Image processing tools should be easily accessible, and readers should be free or cheap, preferably open source. Giving someone a photo with a comment "you can see it, but first you must buy a program for $500" is in poor taste. The basic treatment of images such as rotate, crop, resize, etc. should be available in popular, low-cost and / or open source tools.
  4. Formats should be able to save metadata -  for details see the blog "The reverse side of a digital photo

Resolution and compression

Those of us who dabbled in film photography remember film grain, related to its speed. The lower the speed, the smaller the silver halide crystals, and the finer the details which could be registered. Crystals are replaced in your digital camera by photosensitive elements - for denser elements we have  finer details. Sensor resolution is usually given in (mega) pixels. Scanner resolution is typically given in pixel per inch (or centimeter), abbreviated as ppi or dpi.
Image size in computer memory (width x height [in pixels] x 3) can be significant. To save space, some formats use compression. We will not elaborate here the compression algorithms, which are numerous; it suffices to consider the compression-decompression cycle. If it leaves the image unchanged, the compression is considered lossless, if not, lossy. Lossy compression can be much more effective in compressing the images, but depending on its intensity can leave traces (artifacts).

Formats

GIF

GIF (Graphic Interchange Program) was introduced by Compuserve in 1987. It uses lossless compression, but is limited to 8 bits for all three colors (up to 256 shades or levels of gray). Therefore it is not suited for photography, where we expect a bigger palette of colors. Metadata recording capabilities are very limited. GIF has, however, two very desirable features. We can define a transparent color, allowing us to create graphics (such as logos) that can be pasted over already existing patterns. GIF also has the ability to save multiple images that can be viewed as short movies (animations) - this function alone has resulted in the non-diminishing popularity of this format. Most web browsers can display GIF files, including animation, and it is supported by almost all graphics programs. The files have the extension .gif.

PNG

PNG (Portable Network Graphics) format was developed to overcome the problems with GIF - the limited number of colors and patented compression method. It has been approved for use on the Internet in 1996 and acquired the status of an ISO standard in 2004. PNG allows us to save graphics and photos, using 24 or 32 bit color, and it also has the capability of transparent color. It uses lossless compression - it is suitable for archival storage. Metadata recording capabilities are limited: EXIF format (used by cameras) is not supported; it is possible to use XMP metadata, but popular programs cannot read or write the data. PNG format is growing in popularity; it is displayed by web browsers and is supported by most graphic programs. Files have the extension .Png

TIFF

TIFF (Tagged Image File Format) was created by Aldus and put into use in 1986. Although it is more than 25 years old, it is still very popular format of graphic designers, photographers and the publishing industry. It can save files of up to 4 GiB in full color. TIFF has the ability to record multiple images (so you can save all pages of a document), uses lossless compression and has uncompressed recording possibility. The standard is administered by Adobe who bought Aldus. It has many add-ons and extensions (version 6.0 is relatively universal) as well as several versions registered as ISO standards. It does not have animation or transparency and is not displayed by the most commonly used web browsers. It is popular as a format for storing archival images and scans. Here you can save EXIF and IPTC metadata; using XMP, although theoretically possible, is not a common option. TIFF is very popular and is supported by almost all graphics programs. The files have the extension .tif or .tiff

JPEG

JPEG (Joint Photographic Expert Group) is a very popular format created for digital photos  and other half-tone images. It always uses compression, which is lossy, but provides a significant reduction in size. At the same file size, an image in jpeg format may have 25 or more times more pixels (5 times the linear dimension) than tiff for example, which largely compensates for the compression losses. For archival documents it presents two problems: first, compression errors are most evident in the contrasting element boundaries (for example in the edges of text characters), and second, each further processing generates additional errors because you can not completely turn off compression. The latter problem can be partially bypassed in processing of photos if one uses a program (such as Picasa), which saves only the transformations, leaving the original unchanged.
JPEG is a registered ISO standards, is supported by all image processing and display programs, as well as by web browsers - it is the most popular format for the recording and viewing of photos. In a JPEG file, you can also save the metadata in EXIF, IPTC and XMP, which significantly increases its versatility. Files have most common the extensions .jpg or .jpeg, although sometimes  .jif, .jfif and others are used.

JPEG 2000

JPEG 2000 format (files use extension .jp2 ) is the next generation format developed by the Joint Photographic Expert Group. It has all the advantages of JPEG compression, a better algorithm, and is an ISO standard. It has the ability to record uncompressed images, so it is suitable for the storage of archival materials. The metadata recording format is only XMP. All in all it is a a very good future graphic format.
Although it was introduced over 10 years ago, it lacks popularity. Many readers and graphic editors either do not support JPEG 2000 or support it only to a limited extent, using plug-ins - loading the image in this format takes considerably longer. Picasa does not support this format, and metadata recording requires specialized tools. JPEG 2000 is not displayed in web browsers.

Other formats

There are many other formats, and here are a few you might encounter.

RAW is the common name for many formats writing raw data from the camera sensor - they contain the most detailed image data, which can then be further processed. Although many of them are using elements of TIFF, the formats are closed, limited to the camera manufacturer and as such are not suitable for long-term storage or sharing images.

BMP is a Microsoft raster format, created for Windows. It is very popular, and can be encountered frequently, especially in older applications and the graphics in Windows.

PDF (Portable Document Format ) is not graphic format, but can also incorporate graphics. It is the description of the document, containing all the elements necessary to show / print a single or multiple-page document. It was created by Adobe in 1991-93 and popularized by the publication of free PDF readers by the company. Since 2008 it is an ISO standard, and is no longer controlled by Adobe. In 2005 an ISO standard called PDF/A (a subset of PDF) was published, with a focus on long-term archival storage.

PDF, and specifically especially PDF/A is recommended as a format for long-term storage of documents. It is good for this purpose, providing a versatile, relatively permanent page format, which can also include graphics, both raster and vector. PDF not a graphic format, however, and for photos and scans it is only an additional envelope that wraps the picture. PDF is not directly displayed by your web browser, nor by programs for image processing. The latest version (PDF/A-2 of 2011) provides JPEG 2000 compression and the use of metadata, both for the entire document or individual pages. Processing tools for PDF (excluding proprietary and rather expensive Adobe tools), however, are rare, and even simple manipulations such as adding, subtracting, or rotating the pages of a document require significant effort. When it comes to presentation (rather than long-term storage) of multi-page documents, PDF is simple to use, and competes with another format created for this purpose, DjVu.

Recommendations

What format should we use to store image document scans at home and in the archive? We can see that in the future we will have a great format for archiving and displaying files, including metadata, and great tools to view our our resources on any device. This day has not come yet.  We have old formats that are common, and new ones that are better, but the lack of tools disqualifies them for use right now. It is therefore likely that our children and grandchildren will have to make a conversion to a 'proper' format perhaps in 2050 - to put the photos in new, better envelopes.

What should we do for now? Photographs can be saved in JPEG format at the highest resolution possible. Cameras usually have a variety of options, and you should always choose the best quality. This increases the file size, but the memory is cheap and the price is steadily declining. Store the original images, and do not modify them, only make copies. Add the metadata (common readers like IrfanView or XnView can do it, also Picasa (Options / Tags / Store the tags in the photos)). Scans, especially archival materials, should be stored as TIFF files. Later, you can convert them to JPEG 2000 when it becomes more common. Recording metadata is also highly recommended, although archives usually want to add more information: where the documents came from, what was their fate, what is in them, etc. For this I recommend a simple spreadsheet or office document, or a specialized archival program such as DSpace or Archivist’s Toolkit. If you want to save documents created electronically, PDF format is very well suitable for this purpose.

Read more

Wikipedia articles on the graphics formats"

Marek Zieliński, November 2, 2013

Explore more blog items:

{plusone}

PARTNERZY
Ministerstwo Kultury
Biblioteka Narodowa
Naczelna Dyrekcja Archiwów Państwowych
Konsulat RP w NY
Fundacja na rzecz Dziedzictwa Narodowego
PSFCU
NYC Department of Cultural Affairs