GUIDELINES FOR STLL DIGITAL IMAGING PROJECTS

The Ohio State University Libraries
-updated January 2005-


      Digital projects require that several important decisions be made at the outset. Project managers make choices that balance the desire for high quality images with the limitations of time, money, and storage space. Below are some guidelines that will help in making decisions about image size, resolution, and fidelity; image file storage; general metadata considerations; and outsourcing. This document is divided into the following sections:

Part 1: Scanning Images
Part 2: Saving Images
Part 3: Questions to Ask if You are Outsourcing
Part 4: Metadata

Links to Resources for More Information
Appendix: Template for Collection Record for Archival Discs



Part 1: Scanning images

      Images should be saved at a resolution appropriate to the image’s intended purpose. High-resolution scans are important in certain contexts; in others, low-resolution scans are sufficient. Certain projects require color images, others need only black and white. An image file’s size will be determined by a combination of pixel dimensions, per-unit resolution, and bit depth; these terms are explained below.

1a. Resolution

      Image resolution should be understood as having two aspects. One is the number of pixels per unit of measure, usually expressed as “dpi” (dots per inch) in commercial programs such as Photoshop. 72 dpi is usually considered sufficient for display on a computer monitor, while 300 dpi is considered a minimum standard for images to be printed. The other aspect is vertical and horizontal image resolution, meaning the number of pixels along each axis of the image, as in “1024x768 pixels.” These two types of resolution work together in determining the total size of the image file. An image saved at 600 dpi will have a significantly larger file size than the same image saved at 72 dpi.

Example: As you increase the resolution, you increase the number of
pixels along each axis, thereby increasing the size of the file.
72 ppi 714 x 450 pixels 314 KB
300 ppi 2975 x 1875 pixels 5.33 MB
600 ppi 5950 x 3750 pixels 21.3 MB



1b. Bit depth

Bit depth is defined as the number of bits per pixel. The bit depth chosen when scanning an image will affect the quality of the color. Decisions should be made about whether color is necessary, or black-and-white will suffice. Each of the following pixel depths will result in a successively larger file:
Bitonal
A 1-bit pixel has two possible values, black or white. 1-bit bitonal images are appropriate for some machine-produced documents and books with no illustrations, or with black-and-white line art only. They are not sufficient for documents that include handwriting, photographs, or illustrations with half-tones (greys). Bitonal scanning will produce the smallest possible file.

8-bit Greyscale
An 8-bit pixel has 256 possible values along a spectrum of greys ranging from pure white to pure black. 8-bit greyscale scanning is appropriate for producing images that include grey tones, but no colors—such as black-and-white photographs, and books or documents containing handwriting or half-tone illustrations.
Note: In choosing between bitonal and greyscale, it is a good experiment to scan an image both ways, and compare the results. You may be surprised at what the greyscale scan picks up, and what the bitonal scan leaves out.

8-bit Color
An 8-bit color pixel also has 256 possible color values, meaning a very limited color palette. 8-bit color scanning is appropriate when the image does not require a broad, subtle range of colors, for instance when you are scanning items with flat, sharply demarcated areas of color.

24-bit Color
A 24-bit color pixel has 16.7 million possible color values, meaning they will produce the greatest color fidelity to the original object. These produce very large files, so this choice should be carefully considered. Color is important in some academic disciplines, such as the study of art and artifacts, or the natural sciences. Consider whether a lower level of color fidelity will in any way impair a user’s understanding of the original object.
Note: Improvements to digital reproduction are always in the works. 48-bit and 64-bit color are also available, but not all viewers support this range of color. As new color standards become viable for academic institutions, their adoption should be considered along the same lines as decisions about 24-bit color.

Example: The following table shows the difference in file size when some 300 ppi images of varying pixel dimensions are scanned at bit depths of 8 bits per pixel and 24 bits per pixel:
Pixel dimensions (at 300 ppi)
8-bit
24-bit
400 x 520
204 KB
610 KB
1000 x 1300
1.25 MB
3.72 MB
4000 x 5200
19.9 MB
59.6 MB

 

Columbia University provides a useful reference chart, which includes a summary of recommendations for scanning and saving images of various types of materials: http://www.columbia.edu/acis/dl/imagespec.html#Quick_Guide

1c. A Word About Color Spaces

This document will not discuss color spaces in any depth, but will touch on two of the most common ones, RGB and CMYK, very briefly.

RGB/sRGB is the color space recommended for images intended to be viewed on the Web. It is designed to work with standard computer monitors in rendering colors. When you save a digital image file, if you are not given the option of choosing a color space, it will most likely be automatically saved as RGB or sRGB.

CMYK is geared toward the color system used in commercial printing; it is based on traditional camera color separations. If you are producing images on request for a publisher, they may request CMYK. It is best to ask them specifically about this, as well as the image resolution they want (it may be as high as 600-1200 dpi, depending on the type of publication: journal, reference book, scientific book, art or coffee table book, etc.)

Return to Table of Contents

Part 2: Saving images

From the beginning of a digital imaging project, saving images should be thought of as a long-term activity, extending well beyond the moment that you save the file to disk. The important choices here involve file formats and media storage, as well as how to name files for ease of retrieval and use.

2a. File formats

Definitions of some terms used in this section:

Lossless: Lossless compression does not remove any data from the image, so lossless images saved from an original image can be restored to the state of the original.

Lossy: A lossy image cannot be restored to the original because some of the data is dropped from the file during compression.

Open vs. proprietary formats: Proprietary file formats are usually created and owned by a single company, and can only be edited with certain brands of software. An open format is one that can be read in most editing programs, such as TIFF or JPEG.

There are many formats in which you can save an image file; the choice depends on the purpose of the file. In a well-thought-out digital imaging project, you should keep two copies of each image: a use copy and an archival copy. Use copies are those to be presented as part of a project or exhibit, presumably on the World Wide Web (Web). Archival copies are backups—permanent copies of images that may be drawn upon when:

Archival copies may also serve as master copies to be used in printing.

Use copies—recommended formats

Formats generally recommended for use copies include GIF and JPEG. These are compressed files, relatively small in size; people can view them on the Web without having to wait through the long downloads that larger images require.

GIF (Graphics Interchange Format) File extension: [filename].gif

GIF images are created through a proprietary lossless compression scheme. GIF is recommended for objects with large, clearly divided areas of unvarying color, such as printed graphic art. It is not recommended for photographs or other items with subtle transitions from one area of color to the next.

JPEG (Joint Photographers Expert Group), sometimes called JFIF File extension: [filename].jpg

JPEG images are created through an open lossy compression scheme. JPEG is ideal for digitally reproducing photographs and other continuous-tone media. It is not recommended for items with sharp-edged areas of color or graphic type.


Archival copies—recommended formats

Formats generally recommended for archival copies include TIFF, PNG, and JPEG 2000. These produce very large files which keep the original data intact. JPEG 2000 represents a significant departure from TIFF and PNG.

TIFF (Tagged Interchange File Format)

File extension: [filename].tif

TIFF is an open standard that can be used to save either uncompressed or lossless compressed images. In Web terms, it is not very portable; not all browsers support it, so remote users often have to download a plug-in in order to view TIFF images. However, it is favored as a format for archival copies of image files; it is a good idea to keep local copies of files in this format, and convert them to other formats for the Web. For instance, you might save an image as a 300-dpi TIFF, and display it on the Web as a 72-dpi JPEG.

PNG (Portable Network Graphics)

File extension: [filename].png

PNG is a fairly recent open standard that creates lossless compressed images. PNG-8, which supports 8-bit color, is intended as a replacement for GIF. Like GIF, PNG-8 is good for flat images with sharp color transitions. PNG-24, which supports 24-bit color, is intended as an archival format. PNG is more portable than TIFF, being supported by most Web browsers.

JPEG 2000

File extension: [filename].jp2

JPEG2000 is a very recent standard that promises three improvements over traditional image formats. It builds on the FlashPix standard, which was developed to support efficient delivery options for digital camera images. Its advantages are:

In other words, this standard promises to solve some of the problems of metadata storage and image delivery, as well as bring various imaging communities together.

Files may be saved as JPEG 2000 with programs such as Photoshop 7.0 (with the appropriate plug-in) and IrfanView (a free image editor). However, special software is required for embedding metadata and for realizing the sophisticated delivery options described above. Some open source and commercial software applications have been developed for this purpose; others are being developed as the standard becomes more widely adopted.

2b. File naming

A consistent file naming convention should be developed prior to beginning a project. File names will depend to some extent on whether the project will be hosted at OhioLINK, KB, or locally. Please consult with the OSUL Metadata Librarian on how to construct file names.

2c. Media storage

Once images have been created, multiple copies should be maintained. Any storage medium is susceptible to breakage or failure, so storing copies of files in separate places is essential. A good practice, available to most libraries and archiv

es in a university system, is to store identical copies on a local hard drive, on a remote server that gets regular tape backups, and on high-quality optical disks.

Remote Storage and Accessibility: Within the Ohio State community, several options are available that offer worldwide accessibility and remote storage and maintenance. The OSU Knowledge Bank (KB) is an institutional repository that can provide both collection- and item-level access to each image in your collection. The KB is indexed by Google; therefore, images are available to a Google searcher. Within Ohio, the OhioLINK Digital Resource Commons (DRC) also provides item-level access to images in your collection. OhioLINK also offers more sophisticated accessibility options than the KB.

Materials: Manufacturers of optical disks promise longevity, but since the technology is fairly recent, the longevity of any brand is difficult to establish. Choosing optical disks that have a gold reflective layer is recommended, because gold, unlike other metals used in optical disks, does not oxidize. Pthalocyanine is preferred for the dye layer, as it appears more durable than the commonly used cyanine; however, pthalocyanine CDs may be prohibitively expensive for some projects. In choosing a CD, be sure to read the product’s specifications; a manufacturer can tint its CDs’ reflective layer gold in order to give the appearance of gold to any type of metal. Kodak and Matsui both produce archival quality optical disks.

Data migration: Digital technologies are subject to obsolescence. Optical disks produced now may be unreadable in CD/DVD players in the future. Part of maintaining a digital archive is to be aware of changes to technology, and migrate data to different storage devices as needed. (This is another good reason to keep tape backups of files, and to use open rather than proprietary file formats.) Assessment of the current technology, and the decision about whether to migrate, should be done every five years.

Physical storage: Whenever possible, archival optical disks should be treated like other archival materials. This means keeping them away from excessive light, heat, and humidity, and handling them as infrequently as possible.

Space needs: Once standards for a project have been set—deciding what type of files will be saved, and estimating what their approximate average size will be—the expected file size should be multiplied by the anticipated number of files. This figure should be used in requesting server space and purchasing optical disks. It should also be used in calculating whether your computer hard drive will provide enough space, or whether the purchase of an external hard drive will be necessary.

Return to Table of Contents

Part 3: Questions to ask if you are outsourcing

If you decide to let an outside company produce your digital imaging project, there are several questions you should ask before hiring them. In addition to the usual questions about time and price, you should ask:

Return to Table of Contents

Part 4: Metadata

One of the most essential steps to ensure that an audience is able to find and use your digital resources is to record and keep track of different types of information, called metadata, about each item at the time of image capture. Metadata should be recorded in some standardized fashion, usually in a database or spreadsheet. Standardizing now means it will be much easier to map the recorded data to a scheme later on. This section will help you identify what type of information you need to record and plan how you will gather the information. Three aspects of metadata discussed are format, type, and content. Consultation with the University Libraries will help you to tailor what information you need to collect for your specific project.

4a. Metadata format:

Choosing a format for metadata collection and storage depends on your physical resources, personnel expertise, and the goals of your imaging project. In the case of all image formats except JPEG2000, descriptive and administrative metadata cannot be embedded with the image. Therefore, maintaining the associative link between metadata and the images it describes is essential.

POLICY NOTE: Each archival disk created from an imaging project should have a plain text file (.txt) that contains collection level metadata about the images on the disk. This should include the name of the organization, with contact information; name, and description of the project; date(s) the images were digitized and by whom; and a description of the files, e.g., “500 TIFFs of page images, with JPEG derivatives, scanned at 600 dpi.” This text file should also be duplicated and a copy put on the cover of each CD case. See Appendix for the template for the text file.

Metadata can be collected and stored for delivery with the images for Web discovery and access and for preservation purposes in a variety of formats. Databases can provide structure for your metadata and export records in a variety of formats. XML schemas also provide metadata structure and can output information to many formats as well. Both of these options require personnel familiar with these technologies and appropriate software to implement. Spreadsheets provide an option to enter metadata in a tabular form and allowing the image delivery system to incorporate, manipulate, structure, and deliver metadata.

Databases:

There are many types of databases to choose from. Microsoft Access is often used because it is available on most office computers. Software such as Photo Mechanic is geared toward image collections. MySQL, an open source database product, is currently supported by the OSUL System Specialist. The main consideration here is whether your staff can successfully use the product. The database work should be done by someone who 1) understands how to structure data and 2) knows how to use outputting or reporting functions to get the data out of the database, so it can be compiled into inventories, mapped to other schemes, or formatted into captions for the images when they are displayed on the Web.

XML:

The eXtensible Markup Language (XML) is a W3C standard or set of rules that allows any user or community to define their own subset of rules, or schema, for information collection, storage, and transfer. Like databases, many different metadata schemes are expressed in XML and available to choose from. Free software such as XMLSpy Home Edition and commercial software such as oXygen and XMetal are available as XML editors. One advantage of XML over database format is that files are simply plain text files and are generally smaller than database files. XML set up work should be done be someone who 1) has knowledge of XML and related technologies (editors, XSLT) and 2) understands the information needs of the project so that an appropriate scheme can be selected and templates or other means of collecting data and creating XML metadata files that can be used to compile inventories, map to other metadata schemes, or be formatted into captions for the images when they are displayed on the Web.

Spreadsheets:

If there is no one on the staff who feels equal to the task of setting up a database, you can still put the image data into a spreadsheet, which will not structure the data, but will separate it into fields. Stored this way, the data can be exported to a database or other format (e.g., XML) when the opportunity to create one arrives.

Embedding with Images:

JPEG2000 (JP2) format is the only format to date that accepts additional metadata other than that embedded with the image at the time of capture. In the future, browsers and other applications will be able to decode the metadata in JP2 images. However, now only special JP2-compliant software has the capability to decode and display metadata embedded with the images.

4b. Metadata Types

Three general types of metadata are important for any digital project. Descriptive metadata primarily aids in identification of, access to, and retrieval of your digital content. Technical metadata captures information about digital files that have been created. Administrative metadata helps in the use and management of digital resources, especially for the future.

Note: The more granular metadata collected for your images – or the more specific and detailed the metadata -- the easier it will be to repurpose and use the metadata in the future. On the other hand, the more granular the metadata, the more expensive an imaging project becomes. To determine what best works for your project when defining metadata fields ask yourself these questions:

  1. How much time do I have to complete the project?
  2. How many personnel resources do I have to devote the project?
  3. What information is essential to provide context for the images?
  4. What are all of the types of information most of the potential audience of these images will want to see?
  5. What are all of the types of information most of the potential audience will use to search for images?
  6. From those lists, can any of the information be retrieved from another source once a user has located the image?
  7. What information do you need for your own usage and maintenance of the image collection?

Descriptive metadata

Descriptive metadata refers to the original object which has been imaged. These elements can include creator (person/organization who created the resource), title (name or phrase used to identify the resource), publisher (person/organization who made the resource available). Based on OhioLINK and Knowledge Bank metadata standards, the following descriptive elements should be recorded for each image:

This is the minimum set of elements you should record; others may be added as appropriate.

Technical metadata

Technical metadata refers to the digital image file that has been created of an object. These elements can include the digital file type and size, color/b&w/grayscale, equipment and software used for digitization, and date digitization completed. Technical metadata elements recorded for each image should include the following:

This is the minimum set of elements you should record; others may be added as appropriate.

Obviously, many of these elements will be the same for each record; an overarching technical note can be used for repetitive information or standard text may appear in each repetitive field. Technical metadata can get quite detailed, including descriptions of lighting and camera settings, color spaces and channels, and a good bit more. In choosing which elements to use, think about what will be most useful to the users and managers of the images.

Administrative metadata

Administrative metadata refers to the intellectual property rights, copyrights, use, and preservation of digital images. These elements can include a statement on the purpose of the digitization, a statement on the intended use of the digital resources, and a statement addressing copyright. Administrative elements for each object should reflect the following:

Again, some of these elements may repeat, although copyright and restrictions may vary widely from one object to another within a single collection. Again, in deciding about how much detail to include, think about what the audience for the images needs to know in terms of use, copying, etc. Note that the first two and last elements refer to the digital image of the object, and the third and fourth to the original object. The fifth element could be about either: there might be trademark or copyright restrictions on the original object, or there might be intellectual property restrictions on the digital image or the website of which the image is a part.

4c. Metadata Content

4c. Metadata content One of the ways to expedite the process of collecting and disseminating metadata is through the use of controlled values for some metadata fields. Using selected lists of terms or names and documenting what type of information you are putting in each field and where that information comes from (i.e. locally developed list or a standard vocabulary) will facilitate sharing, managing, and repurposing your metadata in the future.

Examples of standard vocabularies:

Library of Congress Thesaurus of Graphic Materials   http://www.loc.gov/rr/print/tgm1/
Art and Architecture Thesaurus   http://www.getty.edu/research/conducting_research/vocabularies/aat/
Union List of Artists’ Names   http://www.getty.edu/research/conducting_research/vocabularies/ulan/

4d. General metadata schemes

To help you get started and give you an idea of the different types of metadata schemes available, the following is a list of some of the most often used and widely adopted schemes.

Dublin Core: Simple metadata scheme that consists of 15 elements covering general description, rights, and format of a resource. Its intended use is that one complete metadata record is used to describe one discrete item, not a collection of items.

VRA Core: Based on Dublin Core and adapted for use with visual resources, VRA Core is also a simple scheme with 17 elements. Unlike Dublin Core, the scheme recommends controlled vocabularies and content rules for use.

MODS: MODS is a simplified MARC record expressed in XML. Though simplified, MODS records are rich in descriptive information and administrative information to a lesser degree.

TEI: Originally an SGML standard, TEI is now expressed in XML and used for markup of texts. The Digital Library Federation has developed standards for TEI’s use in libraries which defines 4 levels of markup granularity ranging from minimal text markup to word-level markup (e.g., marking a word “boldface typescript” or “foreign language”).

Return to Table of Contents

Links to Resources for More Information

Technical Recommendations for Digital Imaging Projects
Authored by Columbia University
http://www.columbia.edu/acis/dl/imagespec.html

Guides to Quality in Visual Resource Imaging
Authored by DLF, RLG, and CLIR
http://www.rlg.org/visguides/visguide5.html#2.3

Choosing an Appropriate Raster Image Format
Authored by UKOLN, University of Bath
http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-28/html

JPEG 2000 home page
http://jpeg.org/jpeg2000/index.html

OhioLINK Metadata Application Profile
http://dmc.ohiolink.edu/docs/DMC_AP.pdf

Knowledge Bank Metadata Application Profile
(contact Metadata Librarian in OSU Libraries)

Dublin Core
http://dublincore.org/

MODS
http://www.loc.gov/standards/mods/

VRA Core
http://www.vraweb.org/vracore3.htm

Return to Table of Contents

Appendix

Template for Collection Record for Archival Discs*

*items in bold are required.

<discContents>

  <projectInfo>

    <title>[Project Title]</title>

    <description>[purpose or description of project]</description>

    <publisher>[Publisher of resources (include address and contact person/department)]</publisher>

    <discNumber>[which disc out of total in project (3 of 12, 4 of 5, etc.)]</discNumber>

  </projectInfo>

  <filesInfo> (Complete one file Description element for each mimetype on the CD)

    <fileTypeDescription>

      <fileType>[File type (jpeg, tiff, moving image, video, audio, xml, text, pdf, etc.)]</fileType>

      <fileSizeRange>[range of file sizes on cd (12Kb-3M, 100KB-200KB, etc.)]</fileSizeRange>

      <source>[analog source from which files were digitized (Woodcuts were digitized from 1563 Acts and Monuments by John Foxe owned by OSU Libraries)]</source>

      <digitization>

        <digitizingProcess>[scanned, photographed, etc.]</digitizingProcess>

        <digitizingEquipment>[Model name and number of all equipment used (be as specific as possible). Use one element for each piece of equipment.]</digitizingEquipment>

        <digitizingSoftware>[software used in the digitization process]<digitizingSoftware>

        <digitizationPersonnel>[Persons completing digitization (as specific or general as preferred)]</digitizationPersonnel>

        <digitizationDate>[Date or date range of digitization]</digitizatingDate>

      <digitization>

    </fileTypeDescription>

  </filesInfo>

</discContents>

Return to Table of Contents