Information:
Best practices for media use in projectsOnline permission request form for using copyright materials
Digital Initiatives Steering Committee
Metadata Guidelines for Digital Projects, The Ohio State University Libraries
Introduction
This section of the Digital Initiatives Web site includes basic guidelines for creating metadata to describe and manage digital resources. It also includes an explanation of local practices at OSU, and information on, and samples of, metadata documentation.
What is metadata?
Metadata is structured data about an object or a group of objects. While traditional catalog records and other means of describing physical artifacts can also be considered metadata, more often the term is used to refer to data that describes digital objects. These objects can be digitized or born-digital, and include books and articles, images, data sets, music and video, etc. The term 'metadata' encompasses two different types of standards - schemas and content standards. Schemas provide a structure for capturing data; they are typically made up of elements (e.g. title, creator, publisher, file size, etc.) organized in a flat or hierarchical structure. While the elements guide the type of data they structure, they do not govern the content. To ensure consistent, quality metadata, content standards are generally used in conjunction with schemas. These standards include cataloging rules and controlled vocabularies.
Basic principles
There are two principles to keep in mind as you create metadata. The first is consistency. There are many ways to describe any given object, depending on audience, intended use, and available resources. In a sense, the particular schemas and content standards you choose are less important than the consistency with which you apply them. This consistency can be hard to achieve - especially when metadata creation is distributed or outsourced - but it is necessary to support current and future use of your collections. As standards evolve and technology changes, your ability to manipulate your metadata automatically may mean the difference between a collection that is obsolete and difficult to use, and one that is portable and flexible. The more consistent your data, the better results you will receive in batch processing. In other words, it is ok to be wrong - just be consistently wrong.
The second principle is the ability of your metadata to stand alone. Data sharing and reuse is the order of the day, and it is our responsibility to ensure that our data is reusable. One of the biggest barriers to reusability is reliance on local context and delivery systems to describe your items. For example, a photograph of a man and a dog may be easy for your users to understand in the context of a local biographical collection. That same photograph in a aggregator like OAIster, titled only "with his dog," will leave users scratching their heads. A digital object and its metadata record should be intelligible to a user in any context. Some of the easiest steps toward this goal are including collection-level metadata in your records, avoiding acronyms and jargon, and supplying standard information (such as the name of your institution or collection) in every record.
Metadata Guidelines
Describe
Purpose of description
Descriptive metadata represents what digital materials are and what they are about. It is important to provide descriptive metadata so that people can find and use relevant digital materials. As collections grow larger, quality descriptive metadata will demonstrate its value.
There are many standards and options for descriptive metadata. The same item can be described in different ways, depending on the expected use and audience for that resource. For example, a digital photograph of an animal will be described very differently depending on if it is being used as an art object or as an illustration for a zoology class. After determining an approach, the key is to be consistent and use applicable standards.
Content recommendations for specific descriptive fields
The following are content guidelines for descriptive elements that are applicable to any metadata contributed to an Ohio State University Libraries' repository. Using consistent guidelines for these fields, regardless of other metadata decisions, facilitates effective searching across collections. These guidelines are mainly concerned with content standards, since they are intended to apply to metadata being created in multiple schemas. When appropriate, however, we have included information about relevant schemas.
Personal names
Personal names should be indexed by family name and displayed family name first and given name(s) second, with a comma separating them. In the case of Western names, this means using inverted order with a comma separating the last and first names. (E.g. Walsh, Maureen) In the case of non-Western names that begin with the family name, index by the family (first) name, and display in direct order - family name, given name(s). (E.g. Mao, Tse-tung). Due to the limitations of software platforms, it is not always possible to differentiate between the two; all names may have to be entered by "first name" and "last name," and displayed "last name, first name." When this is the case, enter the name in such a way as to ensure proper indexing by family name, and include a citation or other reference elsewhere in the metadata record with the proper formatting of the full name.
When possible, use the authorized form of the name in the Library of Congress Name Authority File (NAF) or other controlled source (see recommended sources below). When an authorized form of the name cannot be found, use the name as it appears on the item being described. If the name appears in more than one form, consider the following rules when deciding which form to use:
- Follow AACR2 rules for punctuation and spacing
- Spell out at least one first name if possible (E.g. Walsh, Maureen; not Walsh, M.)
- Mixed case (E.g. Smith, John; not smith, john or SMITH, JOHN)
- Include diacritics if applicable
- Include suffixes or dates if applicable (E.g. Kennedy, William, Jr.; Cooper, Christopher, 1968-)
- Do not include honorifics (E.g. Sir)
Recommended sources for controlled personal names:
Library of Congress Name Authority File
Getty Union List of Artist Names
Organization names
Names for organizations, such as professional organizations or University departments or units, should be entered as they appear on the item or as they were used at the time the item was created. When possible, use the authorized form of the name in the Library of Congress Name Authority File or other controlled source (see recommended sources below).
In a university environment, many organizations have names that cannot stand alone. For example, Department of Mathematics is not sufficient to distinguish the math department at this university from on at another. In this case, enter the authorized form of the name of the university or other parent organization first. For example, Ohio State University. Department of Mathematics.
Consider the following guidelines when entering organization names:
- Enter the name as it was used at the time. If an organization has changed its name, the name may appear in multiple forms.
- If an organization is commonly known by an acronym, enter the name in that form (including punctuation or its absence).
- If it is necessary to make a name uniquely identifiable with an organization, and no authority record exists, add an institution, place name, or brief phrase indicating the function after the name. Examples: Newman Club (Ohio State University), Renaissance Hotel (Columbus, Ohio)
Recommended sources for controlled organization names:
Library of Congress Name Authority File
Getty Union List of Artist Names
Place names
Place names are often an important access point in historical and cultural collections, but their use can be problematic due to the political, cultural, and religious significance of variants. Variants can be the result of a place name changing over time, or of its use by more than one culture or language. When the name of a place has changed over time, it is considered good practice to use the name as it was when the item was created. This is sometimes possible using controlled sources such as the Library of Congress Name Authority File. Whether you are using a historical or a current name, the use of a controlled vocabulary is strongly recommended, both to collocate and provide intellectual access, and to avoid the appearance of bias. With the development of sophisticated visualization tools, it has become possible to provide map interfaces and other visual access points to geo-referenced collections. If this sort of interface is desired for the collection, either as it is created or in the future, it may prove worthwhile to capture GIS coordinates or other unique identification codes for the places represented.
Recommended sources for controlled place names:
Library of Congress Name Authority File
Getty Thesaurus of Geographic Names (TGN)
Subject Terms
Subject terms can help potential users find and evaluate resources to determine if they will helpful. Search terms can bring out the topic of the resource by using words and phrases that are not in the title or abstract and improve keyword searching. Using controlled vocabularies helps bring together topically related materials by using the same term even if the creators might not have used those terms. There are many controlled vocabularies that might be appropriate. For general coverage, the Library of Congress Subject Headings is recommended. Choosing a specialized vocabulary such as the Getty Art and Architecture Thesaurus(AAT) is a good choice to use terms in a discipline or if materials are very narrow in their coverage.
When describing resources, it is highly recommended to use subject terms to supply additional keywords. If possible, use a controlled vocabulary.
Recommended sources for subject terms:
Library of Congress Authority File
Getty Art and Architecture Thesaurus(AAT)
Medical Subject Headings (MESH), National Library of Medicine
Language
The language of the resource is important to include so that people will know if they can use the resource. There are many controlled vocabulary for language. Depending on the system to you are using, it may be necessary to use the specified language term so that the software can perform properly.
For the Knowledge Bank, language is stored as an ISO Language Code. The entry forms provide a list of languages is spelled-out form.
For resources that do not have a language, such as images or music, do not provide language information.
Recommended sources for language:
ISO Language Codes
MARC Code List for Languages
Citation
This field is recommended to provide enough information for a person to identify the resource you are describing. It is good practice to use a standard bibliographic citation style, such as APA or MLA, for this field. It is particularly important to provide a citation if the digital resource has also been published in another format, such as a journal article, as it may not be clear that there is a relationship between resources.
Recommended sources for citation:
Citation guides from The Ohio State University Libraries
Additional resources for descriptive metadata
- Anglo-American Cataloging Rules, 2nd ed. 2002 revision was developed for library materials. Many schemas, such as Dublin Core, refer to this as a content standard
- Describing Archives: A Content Standard (DACS) is suitable for archives, personal papers, and manuscript collections.
- Cataloging Cultural Objects is a good choice for art, artifacts, and material culture.
Manage
There are many types of metadata that are used primarily to manage digital items, rather than provide access to them. Digital objects present unique management challenges, and may need types of description that are not necessary for physical artifacts. For example, a print book consists of pages held together with binding. The same book, digitized, may consist only of a series of page image files held together by nothing more than a file structure, or a filenaming convention. Metadata can provide a record of the files and how they relate to one another (sequence, chapter divisions, etc.).
All kinds of metadata - including descriptive - can be used by those managing digital files, but there are a number of standards created specifically for this purpose. They are usually grouped by function. Four of the most common functions - technical, structural, preservation, and rights - are listed below, along with links to some standards in the category. These links are only a starting point. There are metadata standards for many different purposes and types of content; a metadata librarian can help you choose the one that is right for your project.
Technical metadata
Technical metadata describes the creation and physical characteristics of digital items, including file formats, compression, and devices used to create or scan them. It is especially useful in preservation activities.
- MIX (NISO Metadata for Images in XML) - XML schema for encoding technical data elements required to manage digital image collections
- TextMD (Technical Metadata for Text) - XML schema that details technical metadata for text-based digital objects.
Structural metadata
Structural Metadata describes the internal organization of a resource, and aids in navigation and display.
- METS (Metadata Encoding and Transmission Standard) - Structure for encoding descriptive, administrative, and structural metadata
Preservation metadata
Preservation metadata facilitates long-term preservation of and access to electronic resources.
- Open Archival Information System (OAIS) - OAIS provides a reference model for an archival system designed to maintain access to digital resources and preserve them over time.
- PREMIS (Preservation Metadata) - A data dictionary and supporting XML schemas for core preservation metadata needed to support the long-term preservation of digital materials.
Rights metadata
Rights metadata enables the management of rights related to information resources.
- Creative Commons - Creative Commons provides a range of standardized digital licenses that can be associated with or embedded in open access web resources.
- METSRights - An XML schema for documenting minimal administrative metadata about the intellectual rights associated with a digital object or its parts. METSRights is most often used to record statements to be viewed by professionals managing the content or to be displayed to end users viewing the content. It is not designed to be machine-actionable.
- ONIX For Publications Licenses - ONIX-PL is an XML format for the communication of license terms in a structured and substantially encoded form.
- Open Digital Rights Language - ODRL is an open standard defining a model and vocabulary for the expression of terms and conditions over assets.
Metadata Documentation
In addition to our general metadata guidelines, we document metadata decisions on a project-by-project basis to ensure consistency and provide a record for future reference. This documentation generally takes the form of a Metadata Application Profile (MAP). A MAP is a tabular document (usually recorded in an Excel worksheet or a Word table) that lists the metadata elements used, both in their standard expression (e.g. dc.creator), and their specific use in the project (e.g. 'author'). It also contains guidelines for the content of each element and a sample value. Two sample MAPs are available on this website: Sample #1: Honors Theses and Sample #2: Library Lecture. More examples are available via the OhioLINK DRMC here. For communities that prefer to create their own metadata, we sometimes provide instructions and/or training documents in a more narrative form as well.
