Kentuckiana Digital Library Production Guide version 2.0

1- About this Guide | 2- Project Planning | 3- Text Encoding | 4- Item Metadata | 5- Digital Imaging

1. About top of page>

As the Kentuckiana Digital Library has developed, standards and best practices have been adopted to ensure the long term viability and preservation of digital assets. This guide outlines these standards and best practices employed by the Kentuckiana Digital Library Production Center at the University of Kentucky.

2. Project Planning top of page>

Proposal Form

Project planning for projects included in the Kentuckiana Digital Library developes as a dialog between a contributing archive and the central digitization center at the University of Kentucky. Before a dialog begins, a project proposal form is completed by the contributing archive. [download form]

Copyright Issues

Contributing archives are responsible for investigating copyright issues in regard to digital assets. The following digital assets management form is utilized when a collection is digitized for inclusion in the KDL. [download form]

A few good sites for investigating copyright issues:
Making of Modern Michigan Copyright FAQ
Colorado Digitization Program Copyright Resources
Peter Hirtle's Public Domain Chart

3. Text Encoding: An Introduction top of page>

Text Encoding is the actual structuring of textual content in digital format. Through the use of <tags>, a given text can be described structurally in terms of content and organization as well as style and presentation. Currently, text encoding practices employed in the digital library field are largely centered on the XML(eXtensible Markup Language) standard. XML is an international standard for the definition of device-independent, system-independent methods of representing textual data in electronic form. Because of the system-independent and device-independent nature of XML, the raw XML files are simply comprised of ASCII text, allowing for portability to future systems.

XML is not a mark up language in and of itself, but more specifically a meta language or set of rules and procedures followed in the creation of a markup language. This is often a confusing point, but one that is important to make. The XML protocol has been used to create many markup languages including EAD(Encoded Archival Description) used to markup electronic archival finding aids, and TEI(Text Encoding Initiative) used to markup electronic versions of texts.

Within the XML standard, Elements, Attributes and Document Instances are defined through the use of a Document Type Definition, often referred to as a .dtd file. This file describes and defines the "valid" structuring and use of <tags> for a given markup language. Software can then be used to interpret the .dtd file and facilitate appropriate presentation in a given interface such as a web browser or through an application assisting with the batch or manual encoding of a given class of documents. One very common example of the use of a .dtd file by a software application, is the common web browser. Through the use of the HTML document type definition, web browsers interpret and display HTML encoded documents for viewing.

HTML is a fairly simple application of the XML standard. HTML mainly deals with presentation vs. content description. With current web browsers, in order to utilize more sophisticated applications of XML, special software is required to deal with the non-HTML document type definitions and convert output to HTML for web browsers. With the advent of XML, some new web browsers are currently being equipped with an expanded range of functionality. Instead of being restricted to a given set or only one document type definition, web browsers are adopting the capability to interpret any document type definition created with the XML standard.

3.2 Markup Languages

Due to the variability of both resource formats and time and resources to digitize specific resources, more than one markup language is often required for effective digital library production and access. The Kentuckiana Digital Library implements XML compliant standard text encoding languages available and developed in the academic digital library field.

Utilized DTDs(Document Type Definitions)

EAD v2.0(Encoded Archival Description) for archival finding aids (official EAD Website)
TEI-Lite(Text Encoding Initiative) for additional full-text resources (official TEI Website)

3.3 Implementing EAD

The Encoded Archival Description was created by Daniel Pitti at the the University of California Berkeley in 1993. After five years of development and beta testing, the Encoded Archival Description markup language was officially accepted as a standard in 1998 by the Library of Congress. Since its official release, EAD has quickly become the standard of choice for the larger digital library community due to its highly descriptive nature and standards based XML(eXtensible Markup Language) compliant structure. The EAD markup language has been created to describe the content and structure of archival finding aids. Creating these finding aids is an important first step in establishing electronic access to archival collections as well as building a sustainable digital library. Once electronic finding aids are completed, the items described within them can then be digitized and linked to their descriptive data elements within the finding aid, thus allowing searchable access points.

Because of the wide variety of formats for archival finding aids, EAD allows for a large degree of flexibility. Still, it is useful in a union environment such as KYVL, to adopt a standard best practice guideline within the EAD structure for encoding finding aids. This allows for better searching as well as a degree of common look and feel for users.

3.4 Using the EAD Web Template Generator

The EAD template generator was developed by Alvin Pollock at the University of California Berkeley. Using the template generator is fairly intuitive. However, keep in mind that the generator is not intended to produce a complete finding aid instance. You'll notice that there is no container list portion to the template generator. The program produces a finding aid instance down to the container list. After the EAD code is generated, this section of the finding aid must be constructed and added to the generated EAD.

Another important point to keep in mind is that the generator is really only intended for conversion of existing finding aids. The template does not allow you to save a document in progress so you can return to it later. A good approach is to create the content of the finding aid and save it in Word or another word processing software. Then, after the finding aid is completed, open up the template generator and copy your finding aid from Word and paste into the template generator.

Please keep in mind that our template generator currently creates a finding aid as specified by our recommended best practice. Individual institutions can add additional code after generating the template output. Also, through working with the central site, additional elements can be added to individual institution templates.

Please consult the Template Generator Instruction page put together by Alvin Pollock for more detailed information on the how the template generator places your input into the actual EAD code.

http://sunsite.berkeley.edu/FindingAids/uc-ead/templates/intro.html

Click here to access the list of template generators currently available for KYVL institutions.

Aside from contact information and specified public identifiers, these templates produce encoded EAD finding aids with the same structure. Once completed, the cgi output can be saved as a text file.

Please contact the Kentuckiana Digital Library Project Manager to establish a template generator for your institution.

EAD Template Generator

Template Generators for the following institutions are currently available.

3.5 EAD Structure with Recommended Best Practice

Through the development process for our EAD template generator, several best practice guidelines have been consulted. These include the EAD Application Guidelines, the EAD Tag Library, our consultant's prepared report, and the RLG Recommended Application Guidelines for EAD. Due to this consideration, our template is very close to the best practice examples represented in the above mentioned resources, especially the RLG guidelines developed by the RLG EAD Advisory Group.

Here, an overview of the structure for an EAD finding aid is outlined with the minimum number of required elements recommended for the Kentuckiana Digital Library included as well as information on producing an EAD container list. A sample finding aid constructed with our EAD template generator is also included here. Please note that except for the Container List section, specific EAD elements are not defined in this document. These are defined in the EAD Tag Library, now available online for EAD v2.0.

Please note that this document does not attempt to replicate information in official EAD publications or in supporting EAD best practice recommendations such as those specified by the RLG Guidelines and the report prepared by the KYVL EAD Consultant. However, these sources have been consulted and given close attention in order to establish our recommended best practice within the framework of emerging national best practice standards for EAD. Also, this document is not meant to dictate local encoding practice. Additional markup specified by local standards and finding aid content may be added to the finding aid generated via the KYVL EAD template generator. In this way, the KYVL recommendation can be thought of as a common starting point for all finding aids in the state. It is strongly recommended, however, that any additional markup added outside the template generator be constructed based upon the recommendations outlined in the EAD Application Guidelines currently unavailable online.

In order to assign file names to finding aids, please consult section 4.8 Assigning File Names in Section 4: Metadata Guidelines.

3.5 EAD Outline

EAD 

Header 

EADID

File Description (Title, dates, etc.)

Profile Description (Creation date, language, etc.)

 

Front Matter 

Title Page

Date

Publisher

Copyright

 

Archival Description  

DID (Title, origination, physical description, etc.)

Administrative Info (Restrictions, preferred citation, etc.)

Biographical History 

Marked-up Paragraph (repeatable)

-or-

 

ChronList (repeatable)

Date

Event

 

-or-

both of the above

Controlled Access Terms

Scope and Contents Note

DSC Analytic Description  

Series Description

CO1 (repeatable)

-or-

DSC In-Depth Description 

Container List

CO1 (repeatable)

CO2 (repeatable)

 

-or-

both of the above

 


3.6 Sample Finding Aid With Recommended Structure and Content

note: Red text is directional and not part of the finding aid mark-up. Structure and content recommendations developed by Lisa Carter, Tom Roscoe and Eric Weig.

[Table of Contents Order]

Descriptive Summary
Administrative Information
Biographical Sketch
Summary of Significant Events
Controlled Access Terms
Scope and Contents
Related Material
Series Description
Container List

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "ead.dtd">
<ead>
<eadheader langencoding="iso639-2b" findaidstatus="unverified-full-draft">
<eadid type="XML catalog">1998ua003<eadid>
<filedesc>
<titlestmt>
<titleproper>Guide to the Margaret I. King(1879-1966) Papers, <date>1893-1966</date> (University of Kentucky. University Libraries. Director's Office.)</titleproper>
<author>Processed by Deborah Whalen; Machine-readable finding aid created by Deborah Whalen.</author>
</titlestmt>
<publicationstmt>
<publisher>University of Kentucky Audio-Visual Archives</publisher>
</publicationstmt>
</filedesc>
<profiledesc><creation>Machine-readable finding aid derived from folder labels by rekeying.
Date of source: <date>March 1999</date>
Processed by Deborah Whalen, <date>March 1999</date>; Finding aid encoded by Deborah Whalen, Special Collections and Archives, University of Kentucky Libraries, <date>March 1999</date>.
</creation>
<langusage>Description is in<language>English</language></langusage>
</profiledesc>
<revisiondesc>
<change>
<date>January 2004</date>
<item>1998ua003 converted from EAD 1.0 to 2002 by Eric Weig.</item>
</change>
</revisiondesc>
</eadheader>

[Title Page Section]

Title: Guide to the Margaret I. King Papers, 1893-1966
Papers could also be Collection or Records, etc. depending on the finding aid

Processed by: Use this to credit collection processor (person who created the original finding aid). Any unknown people should be simply designated as "Staff" followed by any known people (this is assuming that the known people are more recently involved than the unknowns). People should be listed in the order of contribution.

Date Completed: Refers only to date the processing was completed, not including EAD generation. If there are two dates, they should both be listed. In some cases, it would be valuable to list when the processing was initially completed as well as when processing was updated following the latest standards. *Add a revision history section for revision dates.

<frontmatter>
<titlepage>
<titleproper>Guide to the Margaret I. King Papers, <date>1893-1966</date></titleproper>
<num>Collection number: 1998UA003</num>
<publisher>Special Collections and Archives<lb/><extptr show="embed" entityref="ukseal"><lb/>University of Kentucky Libraries.<lb/>
Lexington, Kentucky
</publisher>
<list type="simple">
<head>Contact Information</head>
<item>Special Collections and Archives</item>
<item>University Archives and Records Program</item>
<item>University of Kentucky</item>
<item>Margaret I. King Library</item>
<item>Lexington, Kentucky</item>
<item>40506</item>
<item>Phone: (606) 257-8372</item>
<item>Fax: (606) 257-6311</item>
<item>Email: <extref href="mailto:uarp@lsv.uky.edu">uarp@lsv.uky.edu</extref></item>
<item>URL: <extref href="http://www.uky.edu/Libraries/Special/uarp/">http://www.uky.edu/Libraries/Special/uarp/</extref></item>
</list>
<list type="simple">
<item>Processed by Deborah Whalen</item>
<item>Date Completed: <date>March 1999</date></item><item>Encoded by Deborah Whalen</item>
</list>
<p>Copyright 1999 University of Kentucky. All rights reserved.</p>
</titlepage>
</frontmatter>


[Descriptive Summary]

Title: (marc field 245) Margaret I. King Papers, 1893-1966
Don't use any initial articles or birth and death dates.

Collection Number:1998UA003
List all accession numbers covered by the finding aid. List primary collection first and other included collections second.

Creator: (marc field 100) King, Margaret I., 1879-1966
Use birth and death dates, but not collection dates.


Extent: (marc field 300) cubic ft./linear ft.: # of boxes and/or folders 9 cubic feet
Enter additional specifications inf available. Often noted as an approximate number of items in an exact number of boxes. Can distinguish between boxes of content, as in: 25 boxes of papers, 3 boxes of photographs. List extent of papers first and then other materials. Materials should be described well enough that you can tell if you have the whole collection. Individual items should only be listed separately, if they exist separately and are not considered contents of a box, such as a scrapbook.

Repository:
University of Kentucky Libraries, Special Collections and Archives, Lexington, KY 40506-0039

<archdesc level="collection" langmaterial="en">
<did>
<head>Descriptive Summary</head>
<unittitle>Margaret I. King Papers, <unitdate type="inclusive">1893-1966</unitdate> (University of Kentucky. University Libraries. Director's Office.)</unittitle>
<origination>
<corpname>University of Kentucky. University Libraries. Director's Office. </corpname><persname>King, Margaret I., 1879-1966</persname>
</origination>
<physdesc><extent>9 cubic feet</extent></physdesc>
<repository>
<corpname>University of Kentucky Libraries</corpname>
<address><addressline>Special Collections and Archives, Lexington, KY 40506-0039</address>
</repository>
</did>

[Administrative Information Section]

Access: Recommended default is - Collection is open to researchers by appointment. Can be used to designate a collection that is stored off-site or has other access restrictions such as the collection is closed until a specified date.

Use Restrictions: Recommended default is - Copyright has not been assigned to the University of Kentucky. Can be used to detail how materials in the collections can be reused or permissions that need be be acquired before using the collection.

Preferred Citation:
[Identification of item], Margaret I. King Papers, 1893-1966, 1998UA003, Special Collections and Archives, University of Kentucky, Lexington

Processing Information: This area should be used to describe the processing of the collection. While individuals who processed the collection and the date could be listed here, remember they are noted on the Title Page. This area should be used to detail any additional and/or unusual aspects of the processing, such as that the processing is incomplete or that the photographs were removed, etc.

Acquisitions Information: This area should be used to describe how the materials were acquired by the repository. For example, who donated the collection, etc.

<descgrp type="admin">
<accessrestrict>
<p>This collection is comprised of University of Kentucky records, created and maintained in the course of university business. It is open for research in accordance with the Kentucky Open Records Act (KRS 61.870-884).</p>
</accessrestrict>
<userestrict>
<p>This collection is comprised of University of Kentucky records, created and maintained in the course of university business. The University of Kentucky holds the copyright for materials created in the course of business by University of Kentucky employees. Copyright for all other materials has not been assigned to the University of Kentucky.</p>
</userestrict>
<prefercite>
<p>[Identification of item], Margaret I. King Papers, 1893-1966, 1998UA003, Special Collections and Archives, University of Kentucky, Lexington</p>
</prefercite>
<acqinfo>
<p>Some of the materials in the Margaret I. King Papers were acquired in the 1960s. Other materials have no acquisition records. The accession number, 1998UA003, was assigned to these materials in 1998.</p>
</acqinfo>
</descgrp>

[Biographical Sketch]

This area should be used to privide history and background information regarding the subject or originator of the collection. It can be used also to note significance of the collection and to place it contextually.

<bioghist>
<p>"She has built the library up from one that could be housed in a single room to a library that now contains more than 400,000 volumes and is fourth or fifth in size among the libraries of the South. It would be impossible to estimate the value of her contribution to the University of Kentucky." (Board of Trustees Minutes 6/25/1948:48).</p>
<p>This is how President Donovan described Margaret I. King in 1948. As the University's first librarian, King played a vital role in the development and growth of the library at the University of Kentucky.</p>
<p>Margaret Isadora King was born in Lexington, Kentucky, on September 1, 1879, to Gilbert Hinds and Elizabeth K. King. She earned her Bachelor of Arts from the Agricultural and Mechanical College of Kentucky (University of Kentucky) in 1898, and did clerical work in the Lexington law firm of Allen and Bronston from 1899 to 1905.</p>
<p>In 1905, King began her long career at the University of Kentucky by serving as secretary to President James K. Patterson. She became involved with the library when President Patterson asked her to organize the University's first library in 1909. While organizing the library, she continued as secretary to the president until she was named the University's first librarian in 1912.</p>
<p>During her career as librarian of the University, King continued her education. She performed some graduate work at the University of Michigan, and in 1929, she earned her Bachelor of Science in Librarianship from Columbia University.</p>
<p>Some of King's professional activities included the following: serving as the Kentucky Library Association president from 1926 to 1927, serving as a trustee for the Lexington Public Library for many years, and directing the survey of Kentucky libraries from 1936 to 1938 for the American Library Association's Survey of Research Materials in Southern Libraries. King's development of library methods courses eventually led to the establishment of a Department of Library Science at the University of Kentucky.</p>
<p>In 1948, after King had directed the University's library for 39 years, the Board of Trustees voted to name the library the Margaret I. King Library. This was a special honor since the Board rarely names buildings in honor of those still living.</p>
<p>Although King retired as librarian in 1949, she continued to perform some work for the library at the University of Kentucky. She died in Lexington on April 13, 1966.</p>
</bioghist>

[Controlled Access Terms]

Use Library of Congress Subject Headings or Keywords to describe the collection. List most relevant keywords first.

<controlaccess>
<list type="simple">
<item><subject>King, Margaret I. (Margaret Isadora), 1879-1966.</subject></item>
<item><subject source="lcsh">University of Kentucky--Libraries.</Subject></item>
<item><subject source="lcsh">University of Kentucky--History--20th century.</Subject></item>
<item><subject source="lcsh">Libraries--Kentucky.</Subject></item>
<item><subject source="lcsh">Library administration.</Subject></item>
<item><subject source="lcsh">Libraries--History--20th century.</Subject></item>
</controlaccess>

[Scope and Contents]

This area should be used to describe, in brief, the content of the collection. Through the use of organization and arrangement elements, this section is also used to describe the structure of the finding aid/collection, e.g. "Organized into the following series:". Can be used to describe the filing sequence, e.g. alphabetical, chronological. Organization is used for a broad description of how the whole collection was organized, and Arrangement is used as a narrower description of the filding sequence.

<scopecontent>
<p>This collection consists of materials from 1893 to 1966 relating to Margaret I. King's personal and professional activities and to the development and growth of the library at the University of Kentucky. Items in this collection document the history of the University of Kentucky in general, and the history of the University's library in particular. This collection also includes materials relating to other libraries in Kentucky and to the development of libraries in Kentucky.</p>
<p>The collection is divided into the following three series: Alphabetical File Series, Artifact Series, and Photograph Series.</p>
</scopecontent>

[Related Material]

This area should be used to reference relationships to materials that are not contained in the finding aid, such as an e-book at the same institution, a reference to a book about the subject of the finding aid or a url to a web site about the subject and/or origination of the finding aid.

<relatedmaterial>
<p></p>
<p></p>
</relatedmaterial>

<arrangement>
<p>The Alphabetical File Series consists of folders which are arranged in alphabetical order by topic. These topics mostly relate to Margaret I. King's activities and to the development of the library at the University of Kentucky. The materials in these folders include clippings, correspondence, presentations by King, and programs from conferences and events.</p>
<p>Personal or biographical materials which relate to Margaret I. King are listed under "King." These materials include King's class notes, her memoirs of President James K. Patterson, and her correspondence with people such as Frances Jewell McVey and Glanville Terrell.</p>
<p>Correspondence which relates to a specific topic is included with that topic. Correspondence with certain people, such as President Frank L. McVey, is included under that person's name. Other correspondence is included in "Correspondence - General." The first folder of this general correspondence contains letters from 1909 to 1918. The remaining general correspondence folders contain letters from 1918 to 1950 and are arranged in alphabetical order by the name of the organization or person sending the letter.</p>
</arrangement>

[Container Description]

List and describe the series and subseries that the collection has been organized into. Use Series Titles in original finding aid or within the collection as the headers. Note the box or boxes which include that series. Add any informaiton beyond the Series Title that will better direct a patron to a group of cohesive materials.

<dsc type="combined">

<!--The following must be generated outside of the EAD template generator-->

<dsc type="in-depth"><head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series">1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p>Several topics in this series include: the dedications of libraries at the University of Kentucky, gifts to the library at the University of Kentucky, plans for state-wide library development, and the activities of the library and the university during World War I and World War II. Other topics include the procedures of various library departments, library orientation for freshmen, and the business and social activities of the library staff.</p>
<p>Several organizations which are represented in this collection include the American Library Association, the Kentucky Department of Library and Archives, the Kentucky Library Association, and the Lexington Public Library.</p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle></unittitle>
<dao show="new" actuate="user" href="href for digitized item"><daodesc>[view image]</daodesc>
</dao>
</did>
</c03>
</c02>
<c02 level="subseries">
<did><unittitle type="subseries">2nd Subseries of 1st Series Title, <unitdate type="inclusive">dates </unitdate></unittitle>
</did>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="box" type="folder">3</container>
<unittitle> container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">4</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>
</dsc>
</archdesc>
</ead>


3.7 Encoding the EAD Finding Aid Container List

The web template generator for marking up EAD Finding Aids currently does not facilitate markup for the "Container List" portion of the EAD finding aid.

Because of the complexity and number of potential variants in container lists, these examples taken from the sample finding aid serve as a general recommendation. See the EAD Tag Library and the EAD Application Guidelines for more examples.

What is a Container List?

A "Container List" for an archival collection is the detailed listing of the collection's contents and their organization via storage container types. A "Container List" in EAD is defined through the <dsc>(Description of Subordinate Components) element.

There are three approaches to the <dsc> element in EAD. These approaches are defined through the use of the <dsc> type attributes:

"analyticover"(A container list made up of series descriptions only.),
"in-depth" (A container list containing hierarchical description of the components of a collection including box, folder, and other locations. This may also be a "boxlist," "handlist," or "calendar." For purposes of similar look and feel for users, "Container List" is the recommended title.), and
"combined" (A mixed series description and container list.)

The accepted practice for KYVL in regards to the <dsc> element, is to have one <dsc> section for "combined".

The Component <c0x> Element

In order to create a container list, the use of the Component element in EAD is essential. Container lists are often comprised of nested series. Describing these series with a nested structure is achieved through the use of the <c0x> elements. Within the current EAD standards, the Component element can be expanded to 12 levels (<c012>). NOTE: When nesting occurs, as in the example below, a <c0x> element cannot be closed until all the sub-<c0x> elements have been defined for the series. This is shown below in the example when the <c01> and <c02> Component tags are not closed(</c02></c01>) until the last <c03> component element has been defined for a series.

The <container> Element

The Container element is used to define the storage medium for the archival items. The following choices are offered with EAD;

carton
box
folder
reel
frame
oversize
reel-frame
volume
map-case
box-folder
page
folio
othertype

These units are specified through the type attribute for the <container> tag and further indicated by the use of the label attribute as in the example below. Note: It is recommended that the <container type="box"> element be repeated beside each <container type="folder"> element. It is also recommended that the <container> element only be used for "item" level component elements..

Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series"> 1st Series Title<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries of 1st Series Title <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="box" type="folder">1</container>
<unittitle>container contents<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>

The <unittitle> Title of the Unit Element

The <unittitle> element gives a title to the material being described at all levels of the finding aid. In the container list, these titles can be specified for specific items as well as for series and subseries outlined. When specifying a series title or subseries title, the type attribute is used to specify "series" or "subseries".

Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series">1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>

The <unitdate> Date of the Unit Element

The <unitdate> element specifies a year, month, or day of the described materials at all levels of the finding aid. The date may be in the form of text or numbers, and may consist of a single date or range of dates. In the container list, it is recommended that the <unitdate> element be used within the <unittitle> element with the type attribute specifying "inclusive".

Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series"> 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>

Using the <dao> (Digital Archival Object) Element

Once the material in an EAD finding aid has been described down through the Container List, the material can then be digitized and embedded into the Container List in the form of thumbnails or icons or hot-links that link to the digitized items. This is achieved with the <dao> Digital Archival Object element. Note in the working example below in the Series 1, Box 1, Folder 2. The <dao> element is used with the href attribute. This attribute simply gives the URL for the digitized object.

Example:
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
<dao show="new" actuate="user" href="http://athena.uky.edu/images/kukav/1998ua003/0024.jpg">
<daodesc>View Image</daodesc>
</dao>
</did>
</c03>

Using the <extref> (External Reference) Element (for web links)

The <extref> element is used in the EAD Container List to link to another finding aid or other web address and/or to link to a secondary application that handles the navigation of a manuscript or other multi-page resource in the archival collection. An example of using the <extref> element is shown below.

Example:
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>
<extref href="http://www.website.edu">container contents</extref>
<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>

Database Approach to EAD Container Lists

An alternative approach to constructing EAD Container Lists is to use a database to record the information using content fields that can then be encoded using EAD elements. The actual encoding is automatically generated through database output in the form of a report or a delimited ASCII text file.

The Central site has successfully converted both FileMakerPro and Access databases to EAD container lists and is happy to assist project sites developing this approach to encoding the container lists.

3.8 TEI (Text Encoding Initiative)

The Kentuckiana Digital Library Production Center has developed automated markup applications that facilitate the production of digital page image archives with page scanning from originals or microfilm and high-level OCR (Optical Character Recognition) text underlying the page images for full-text searching. The XML markup utilized is the Text Encoding Initiative's TEI.2 or TEI-Lite document type definition, created to deal with a wide variety of textual formats including books, journals, poetry and original manuscripts.

With any full-text encoding project best practice is often dependent upon the format and structure of the specific material to be encoded. Best practice guidelines for the TEI markup language are available in the TEI Text Encoding in Libraries Draft Guidelines for Best Encoding Practices Version 1.0 (July 30, 1999). Prepared by LeeEllen Friedland, Library of Congress. Nancy Kushigian, University of California, Davis. Christina Powell, University of Michigan. David Seaman, University of Virginia. Natalia Smith, University of North Carolina at Chapel Hill. Perry Willett, Indiana University. This guide establishes recommendations for encoding with TEI-Lite and defines 5 levels of encoding based upon the proposed use of the encoded text and the amount of time and funding available for a given project.

Encoding levels 1-2 can be encoded via automated processes, levels 1-4 require no expert knowledge of content. Level 5, in contrast, requires scholarly analysis. Levels 1-4 allow the conversion and encoding of texts to be performed without the assistance of content experts and can be enriched with more markup at any time. Recommendations for Levels 1-4 are intended for projects wishing to create encoded electronic text with structural markup, but minimal semantic or content markup.

 

4 Item Level Metadata: An Introduction top of page>


Metadata is commonly referred to as "data about data". More specifically, metadata is the structured description of an object or collection of objects through the use of specific data elements. Another common description for metadata is "cataloging data". In the creation of digital libraries, effective metadata is essential for the presentation, discovery and navigation of digital archival objects.

By allowing for a structured description, metadata offers searchable access points (title, author, subject terms) for discovery systems as well as parsable data for applications developed for the migration and/or reformatting of data. This is an important consideration. Metadata is used in multiple contexts with current and expected future usage and is often reformatted for presentation in a variety of formats currently ranging from database structures such as OAI (Open Archives Initiative) to tagged XML document headers.

Traditionally, the creation of metadata has mainly been in the form of MARC bibliographic records. Although MARC is very robust and well defined as a standard, the overhead and level of expertise needed to catalog with MARC is a considerable stumbling block for many digital library projects whose participants do not have significant if any MARC cataloging experience. In order to provide a more streamlined approach to describing digital resources, the Dublin Core Element Set including 15 elements to describe networked resources has emerged as the likely candidate for adoption as a preferred standard used in describing digital library resources.

4.1 The Dublin Core: An Item Level Metadata Element Set


The Dublin Core metadata element set grew from efforts spearheaded by OCLC(Online Computer Library Center Inc.) in 1995. Its focus is centered on one basic set of metadata elements selected and refined by a group of experts from the national and international library and information science communities. Specifically, the Dublin Core metadata element set is comprised of the following fifteen elements used to describe digital networked resources.

The 15 Dublin Core elements are categorized into the following three groups:

Content: Title, Subject, Description, Source, Language, Relation (to another resource), Coverage (spatial or temporal characteristics of intellectual content)

Intellectual Property: Creator, Publisher, Contributor, Rights

Instantiation: Date, Type (such as archival finding aid, electronic text, etc.), Format (of data, to identify software and hardware required for use), Identifier (URL).

4.2 Why Use the Dublin Core?

The Dublin Core Element Set is intended to allow metadata implementors to strike a balance between ease of implementation and the production of metadata records that facilitate effective resource discovery. The Dublin Core's simplicity allows implementation by non-catalogers. At the same time, the Dublin Core is extensible, allowing for the incorporation of more sophisticated description standards.

By establishing and promoting a common, easily understood core set of elements to describe digital networked resources, as the Dublin Core is adopted more widely it will facilitate searching across discipline boundaries.

The Dublin Core Element Set can be converted to MARC and other common bibliographic record formats.

Most importantly, with an international developmental scope including active participation and support in over 20 countries in North America, Europe, Australia, and Asia, the Dublin Core has been established as the primary candidate for the establishment of a formal standard for describing digital networked resources.

4.3 Overall Guidelines for Data Entry

Punctuation
Unless the resource includes punctuation or the element definition includes specific guidelines for punctuation, don't put it in.

Symbols and Abbreviations
Do not use symbols to abbreviate unless they are taken from the source or the Element Definition specifies. For example, for an uncertain date for either the Date Digital or Date Original element, the '?' symbol is used. Use abbreviations if they are taken from the source, or are accepted as common and easily understood.

Capitalization
Taken from the source or specified through the use of general grammatical standards. Exclude initial articles. Only acronyms warrant the use of all caps.

Keywords, Topics and Subject Terms
The Dublin Core encourages combining both subject terms taken from a controlled vocabulary such as the Library of Congress Subject Headings, and keywords assigned by the record creator. Keywords also include the KYVL Kentuckiana Topics list designed to allow broad topic access to the digital collection. List most specific terms first and broader terms last.

Questions to Ask When Entering a Record

  1. Are you entering information into the record for an entire collection or for an individual item?
  2. Is the information comprising the record useful for resource discovery?
  3. Is the content of the element known with certainty or readily available from existing databases or information sources? If not, can you provide an educated, informed guess that will not be misleading?
  4. If you are emphasizing the attributes of the original object (not the digital surrogate) in the record, have you included this information in the correct element fields? Have you included meaningful information about the digital surrogate in the appropriate element fields?

4.4 Digital Library Metadata Dictionary

The Official Dublin Core Metadata Element Set Guidelines specify that all of the 15 metadata elements are optional. However, for the sake of effective resource discovery, the KYVL specifies optional, recommended, and required elements. The following record structure lists 9 required Dublin Core metadata elements, 4 recommended Dublin Core metadata elements, 2 optional Dublin Core metadata elements.

The following specifications serve as guidelines for creating item level metadata for digital archival objects withing EAD finding aids. Definitions listed for the Dublin Core elements are borrowed from the Official Dublin Core Guidelines available on the Web at: http://purl.org/dc/documents/wd-guide-current.htm

4.5 Elements

Required Item Level Dublin Core Elements for EAD Items

NOTE: Elements marked with a "*" are those elements that are inherant within the broader sections of an EAD instance. Therefore, these elements do not need to be input at the item level.

Element: Title
Required: Yes
MARC: 245|a
Use more than Once: When more that one title exists. For example, with primary and secondary titles or in the case of variant titles. If resource discovery is enhanced by the addition of an alternate title or titles supplied by the source.
Scheme: Free Text
Definition: A name given to the resource, usually by the Creator or Publisher. Typically, a title will be a name by which the resource is formally known.
Use Guidelines:
1. Input titles and subtitles using the punctuation appearing on the source.
2. For items or collections not given a title or name, select a descriptive term or short phrase. Such a term or short phrase may be derived from other descriptive fields within the record.
3. Use "untitled" as an entry only when the resource was deliberately given this as its formal title.

Element: Date
Required: Yes
MARC: 260|c
Use more than Once: Yes
Scheme: ISO8601
Modifier: Digital
Definition: The date when the digital version of the resource was created. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. In this case, the creation date of digital resource
Use Guidelines:
1.Date format: YYYY-MM-DD as defined in ISO 8601, http://www.w3.org/TR/NOTE-datetime.
2.Use a dash '-' in order to separate dates.
3.Use a question mark '?' before the date if the date is not definite. Use a 'ca' before the date to indicate an approximation.

Element: Identifier
Required: Yes
MARC: 856, 020, 022...
Use more than Once: Yes
Scheme: URL, URN
Definition: A string or number used to uniquely identify the resource in a given context. Examples for networked resources include URLs and URNs (when implemented). Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system, in this case a Uniform Resource Locator (URL) or Persistent Uniform Resource Locator (PURL). This field is hotlinked in the Kentuckiana Digital Library Database record.
Use Guidelines: Enter to URL or PURL for the digital object in this field.

Element: Source *
Required: Yes
MARC: 534|n
Use more than Once: Yes, but not recommended.
Scheme: Free Text, Accession No., Control No., Call No., ISBN, ISSN, FPI
Definition: A Reference to a resource from which the present resource is derived. While it is generally recommended that elements contain information about the present resource only, this element contains metadata for the second resource when it is considered important for discovery of the present resource. Source is not applicable if the present resource is in its original form. The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.
Use Guidelines:
1. Include local call number, local control number, or accession number, etc.
2. The Description field is used for other information describing the Source resource.

Element: Publisher *
Required: Yes
MARC: 260|b
Use more than Once: Yes
Scheme: Free Text Modifier: Personal or Corporate Name
Definition: The entity responsible for making the resource available in its present form, such as a publishing house, a university, or a corporate entity.
Use Guidelines:
1. List multiple publishers in separate fields
2. Use Publisher element to indicate the appropriate KCVL institution.

Element: Language *
Required: Required
MARC: 546|a
Use more than Once: Yes
Scheme: ISO639
Definition: A language of the intellectual content of the resource.
Use Guidelines:
1. The content of the Language field should be taken from RFC 1766 [RFC1766] also known as the ISO 639 standard [ISO639] which includes a two-letter Language Code.

Element: Rights *
Required: Yes
MARC: 506|a
Use more than Once: Yes
Scheme: Free Text
Definition: An identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource. Information about rights held in and over the resource. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. This field is hotlinked in the Kentuckiana Digital Library Database.
Use Guidelines:
1. Establish a generic textual statement describing the rights management statement for your digital resources on the Internet. If restrictions exist, supply an alternate URL indicating how to contact the appropriate library faculty for specifics on using the resource.

Element: Resource Type
Required: Yes
MARC: 655|a
Use more than Once: Yes
Scheme: Text, Image, Sound, Dataset
Definition: The nature or genre of the content of the resource. Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. To describe the physical or digital manifestation of the resource, use the Format element.
Use Guidelines:
1. Select appropriate type. Options: Audio File, Electronic Text, Photograph, Video File

Element: Format
Required: Yes
MARC: 856
Use more than Once: Yes
Scheme: Free Text
Definition: The physical or digital manifestation of the resource. Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary.

Recommended Item Level Dublin Core Elements

Element: Creator
Required: Yes
MARC: 1xx, 7xx
Use more than Once: When more that one Creator exists and the inclusion of the additional Creator(s) enhances resource discovery.
Scheme: Free Text
Modifier: Personal or Corporate Name
Definition: The person or organization primarily responsible for creating the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. Examples of a Creator include a person, an organization, or a service.
Use Guidelines:
1. Creators should be listed separately in the same order that they appear on the source.
2. Personal names should be listed surname or family name first, forename or given name, middle name or initial, suffix, prefix.. When in doubt, give the name just as it appears on the source. Add known birth and death dates.
3. Use full corporate names. The entry element is the full name of the business or organization excluding initial articles.
Examples:

PERSONAL NAMES
CORPORATE NAMES

King, Margaret Isadora, 1979-1949
Fitzgerald, F. Scott (Francis Scott), 1896-1940
Hemingway, Ernest, 1899-1961
Berry, Wendell, 1934-
Burton, Alonzo Carroll
Digital Imaging, Inc.
Kentucky Art Museum
Warner Brothers Company
National Resource Center for Family Services
Colonization Society of Kentucky

Element: Subject
Required: Recommended
MARC: 6xx
Use more than Once: Yes
Scheme: Keyword, LCSH (Library of Congress Subject Headings), TGM I (Thesaurus for Graphic Materials I), TGM II (Thesaurus for Graphic Materials)
Definition: The topic of the content of the resource. Typically, a Subject will be expressed as keywords, key phrases or formal classification subject terms that describe a topic of the resource. Recommended best practice is to select a value from LCSH and/or TGM in addition to selected keywords.
Use Guidelines:
1. Keywords and Subjects may come from other Dublin Core fields defined for the resource.
2. Enter person or organization for Subject as outlined under Creator element.
3. Try to be as specific as possible with the underlying focus on aiding resource discovery. If using a keyword, use the most significant or unique words first, with more general words for broad description used as necessary. Use terms found on or about the item.
4. Use subject strings if able. This is strongly encouraged. They may be taken from an alternate record that already exists( for instance, borrowing them from an existing MARC record or creating them), or they can be created for the Dublin Core record.


Element: Coverage
Required: Optional
MARC: 654|a
Use more than Once: Yes
Qualifier: Spatial or Temporal
Definition: The spatial or temporal characteristics of the intellectual content of the resource. Spatial coverage refers to a physical region (e.g., celestial sector) using place names or coordinates (e.g., longitude and latitude). Temporal coverage refers to what the resource is about rather than when it was created or made available (the latter belonging in the Date element). Temporal coverage is typically specified using named time periods (e.g., Neolithic) or the same date/time format as recommended for the Date element.
Use Guidelines:
1. Select terms from a subject heading list or thesaurus to identify place names (i.e., Getty Thesaurus of Geographical Names, Library of Congress subject Headings, etc.)
2. Use freetext to input B.C.E dates.
3. Enter range of dates on the same line and use a dash (-) to separate dates.
4. Some time periods are not adequately described using a date format, such as Jurassic Period or the Dark Ages. In this case, give the text form of the time period (i.e. Jurassic Period.) Select terms from a subject heading list or thesauri to identify these time periods (i.e. Library of Congress Subject Headings).
5. If date is uncertain use question mark (?) following the date to indicate it is an approximate date. If the date is estimated, use "ca" prior to the date to indicate estimation.
6. It is important to make the distinction between temporal Coverage, source Date, and Date. For example, the temporal coverage of a photograph of an art object is the date of the art object, and the date of the photograph is the Date Original. The date the photograph was digitized is the information entered into the Date Digital element.


Element: Description
Required: Recommended
MARC: 5xx
Use more than Once: Yes
Scheme: Free Text
Modifier: Abstract, Free Text
Definition: A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. Can also include information describing related resources and information describing the source.
Use Guidelines:
1. Enter natural language descriptive text, remarks, and comments about the object taken from the item, or provided by the record creator.
2. Be as brief as possible, but not at the expense of a rich description for resource discovery. A few sentences or paragraphs is a good average.
3. Include additional information such as measurements of a depicted object, description, provenance, etc. as long as this information is not included in other elements.

Element: Contributor
Required: Optional
MARC: 1xx, 7xx
Use more than Once: Yes
Qualifier: Personal or Corporate Name
Definition: A person or organization not specified in a Creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a Creator element (for example, editor, transcriber, encoder, findaid preparer and illustrator).
Use Guidelines:
1. Enter personal and corporate names in same format as Creator element.
2. Do not specify role (e.g., editor, translator, etc.) of the Contributor. Use the Description field to tie this information together within the record.

Element: Relation
Required: Recommended when a relationship exists. MARC: 787|n, 787|o
Use more than Once: Yes
Modifier: Yes.
Scheme: URL URN
Definition: An identifier of a second resource that holds a specific relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), a translation of a work (IsBasedOn), and an item from an archival collection with finding aid (IsPartOf). Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. This field is hotlinked in the Kentuckiana Digital Library Database.
Use Guidelines:
1. For relation to an existing finding aid or other resource, place the identifier for the finding aid in the relation identifier field. Include a description of the relation(s) in this field.


4.6 Creating Item Level Metadata

The following directions are intended for use by developers locally digitizing material for inclusion in the KYVL Kentuckiana Digital Library. These directions are used in the central digitization center's workflow and are intended to facilitate fast and accurate creation of records by allowing project developers to gather metadata in a straight-forward and efficient manner. Once the metadata has been created, the local spreadsheet file should be delivered to the KYVL Kentuckiana Digital Library manager for processing.

Simple Spreadsheet Approach

Required Software: Microsoft Excel

The simple spreadsheet approach to gathering metadata for Kentuckiana Digital Library projects is meant to capture all the essential data that is unique to specific individual items. Data that is consistant for an entire range of records, is added later, automatically, as the data is re-formed into XML EAD (Encoded Archival Description) finding aid Container List items. Examples of consistant data elements that can be added later and/or those that find their place in earlier sections of an EAD Finding Aid, are the Rights, Language and Publisher elements.

Basic Spreadsheet Layout

  • Rows represent item records
  • A separate spreadsheet should be used for each project. A project should be designated by a unique collection/accession number, so that each separate spreadsheet holds data related to only one collection/accession number.
  • The identifier field should specify the unique id for the digital object. This will also be used as the file name for the digital object itself. When re-forming the data, this information is used to build the URL for the digital resource. This also allows digital conversion to go on separate from the metadata creation, with the two products coming together seemlessly in the end result.
  • The identifier in the database records should be no more than 8 characters.
  • Example: digital photographs from a collection would be listed as accession + sequential number; 64m1.0001, 64m1.0002, ... When the photographs are scanned, the resulting files should be stored simply as sequential numbers inside a directory that specifies the accesion number, so on the server, in the above example, the files would be in /images/kukwf/64m1/0001.jpg, 0002.jpg, ...
  • Additional Information: Along with each spreadsheet file, the central site needs to know the following information conerning the data; Publisher,:Collection Title, Collection Number.
  • How to Handle Multiple Subjects: Always create the subject as the last cell in a row. Place multiple subjects in the same row, in the cells extending to the right. ...
  • How to Handle other Multiple Fields: Most of the fields in Dublin Core can be repeated. For example, you may have more than one creator for a resource. However, aside from the Subject field, most of the other Dublin Core elements will only need to be repeated occasionally. To compensate for this, if needed, simply create a new column, or however many are required, to the right of the field that needs to be repeated. Label the new collumns with Dublin Core element name + number; Creator2, Creator3, etc.
  • Series and Subseries Columns: When a new series and subseries begins, a new row should be created. As shown in the example graphic above, the new row holding the series and subseries references should not contain any other information.

The central production center has also converted various database formats to EAD. For an example of this type of approach, please take a look at the Conversion of Microsoft Access Databases into EAD-Encoded Finding Aids document from the UC Berkeley site.

5 Digital Library Imaging: An Introduction top of page>


Digital imaging is a field of study that includes digital photography, scanning, composition and manipulation of digital images. What is a digital image? It is binary code defining the digital representation of an actual image or item such as a photograph or book page. The binary code (Computer code represented by a series of bits, the smallest unit of computer data, indicated by a 1 or a 0.) defines tiny segments of the digital image called pixels. These pixels are assigned color characteristics within a given color space, and aligned on a grid of columns and rows that can be viewed on a computer monitor or printed onto paper using a computer printer. Through the practice of image capture, the number of pixels created for a given area of a scanned item as well as the number of colors or shades of gray that are used to define the color characteristics of each pixel are specified. Image capture is then achieved through the use of a scanning device such as a flat bed scanner or digital camera. The scanning device reflects bright light off of or through the item and tiny light-sensitive sensors called diodes detect the degree of presence or absence of light created. The scanning device then converts these light intensity readings to binary code for each individual pixel comprising a digital image. Once the scanning device has completed this process and constructed a digital image, it can be stored as a computer file and manipulated through the use of digital image editing software.

In relation to digital libraries, digital imaging plays a central role in the creation of digital archival objects. Unfortunately, at this time there is no single set of guidelines or accepted standards for determining the level of image quality required in the creation of digital-image databases. Also, as with other aspects of digital library production, dealing with technology that is in a constant state of flux presents unique challenges and pitfalls. It is therefore important to define desired quality guidelines based on current and expected use. These guidelines should be established and followed to produce consistent and expected results.

What is the scanned item's expected use? Should the image be available in a printable format? Does the expected use warrant a high resolution on-screen version? Is the item being scanned for long term use? The answers to most of these questions can differ from one item to another. Current technology standards may dictate answers, especially in the case of printable formats. Monetary budgets may also dictate answers. However, one question that can be answered universally at this point is whether or not the digital items for the Kentuckiana Digital Library are being scanned for long term use. A digital library is in the business of establishing sustainable access to collected resources and since a large part of the cost in building a digital library is associated with the initial digital imaging work, it's best to approach this activity with long term usability in mind.

5.1 Archival Imaging

It is often a confusing point to establish what is really meant when the term "archival" is used in the context of digital imaging for libraries. Although best efforts are made to represent the original in a digital format with visual characteristics as close to the original as possible, the current state of digital imaging technology is not capable of faithfully duplicating original material. The term "archival" when used in relation to digital imaging does not refer to the creation of an exact digital replica of the original. The appropriate use of this term in this context is that it is the practice of digital imaging for current use and long-term viability of the digital images.

5.2 "Master" and "Deliverable" Images

One of the most important concepts to consider when utilizing these guidelines is the concept of "master" and "deliverable" image files. The "master" image is captured for off line storage, scanned at hi resolution, and saved with the TIFF lossless image compression or uncompressed format. This will be a large file in terms of size and is preferably stored on CD ROM or large hard disk with backup.

Alternatively, "deliverable" images sometimes called "derivatives" are lower quality and derived from the "master" image typically through the use of a batch image processing software application. "Deliverable" images are saved with a lossy compression scheme to achieve acceptable files sizes for current network access within the digital library infrastructure. For effective levels of access, several "deliverable" versions of a "master" image may be required.

5.3 Implementation Overview


When establishing guidelines for imaging, the Kentuckiana Digital Library worked with an imaging consultant and sought to find a reasonable consensus among the multitude of imaging practices adopted by other digital library projects to provide a succinct and clear best practice for our digital imaging efforts. In doing so, the following overall implementation guidelines were identified.

  • Adopt practice of creating high resolution "master" image files used to produce lower resolution "deliverable" image files for standard network delivery to the public.
  • Calibrate scanning equipment before beginning any scanning project using standard photographic targets. Make only minimal changes to a scanned object to be saved as a "master" image file. It is best to avoid having to make any adjustments. This allows for a more streamlined workflow in terms of guidelines for student scanning technicians and more importantly, lends consistency to the collection of images. This consistency facilitates recording administrative metadata for images as well as scripting of batch processing software to produce "deliverable" images.
  • Capture "master" files using 24-bit color rather than 8-bit gray scale when there is any color information in the original documents. When in doubt, use 24-bit color.
  • Most if not all of the "deliverable" images will end up on the Web. Therefore, except for hi-resolution versions of the images, these files should have a target file size limit of 200K, compressed to avoid especially slow performance.
  • Establish specific minimum standards for imaging and follow them. These not only include specific resolutions for various types of material, but also include specific screen size requirements for effective access and use of "deliverable" digital images.
  • Always run test scans for quality before moving into full production for a scanning project.
  • Save an ASCII text file for each scanned batch describing the image capture procedures utilized. This file should list bit depth, color space, dpi and file type. Store this file with the "master" images.
  • When expected use warrants it, create multiple "deliverable" images for access.

5.4 Important Concepts

Understanding the following concepts is essential to establishing proper scanning practice for a digital library project.

Resolution (dots per inch)

DPI = Dots per inch = units used to measure the resolution.

  • Spatial Resolution
    By definition, spatial resolution is used to describe what a printer can print, a scanner can scan, and a monitor can display. In printers and scanners, resolution is measured in dots per inch (dpi)--the number of pixels a device can fit in an inch of space. The physical resolution at which a device can capture an image. The term is used most frequently in reference to optical scanners and digital cameras.
  • Interpolated Resolution
    This term indicates the resolution that the device can yield through interpolation -- the process of generating intermediate values based on known values. For example, most scanners offer an optical resolution of 300 dpi, but an interpolated resolution of up to 4,800 dpi. This means that the scanner can actually capture 90,000 pixels per square inch. Then, based on the values of these pixels, it can add 15 additional pixels in-between each pair of known values to yield an interpolated higher resolution.

The finer the detail, the higher the resolution required to capture faithfully. This is true due to the fact that the higher the dpi, the more information recorded in the file. Higher resolution facilitates the ability to enlarge a detail in the image.

Color/Bit Depth

Color depth, also referred to as bit depth, measures the number of bits of color data which are stored for each pixel; the greater the bit depth, the greater the number of gray scale or color tones that can be represented and the larger the file size. Common bit depths are:

1 bit bitonal (black and white) is a usable color depth for select textual information where the original is clean and free of defects that will effect the quality of the scan. 1 bit bitonal is also used in scanning textual information from microfilm. It is recommended that project managers wishing to use 1 bit color depth run sample scans to determine acceptable quality. Also, consider that a "master" image can be captured at 8-bit gray scale or 24-bit color and then converted to a 1-bit bitonal "delivery" copy.

8 bit color depth is not suitable for digital masters and is not recommended for use.

8 bit gray scale is used for select items, generally black and white photographs that have no color characteristics, and microfilm. If there are color characteristics, even apparent from the age and natural deterioration of the photograph or other material, 24-bit color is recommended in order to capture the image as true to the original as possible.

24-bit color is recommended whenever there is color information in the original item. Although the "master" scan will be larger in file size vs. other color depths such as 8 bit gray scale, with JPEG compression, 24-bit "deliverable" color images usually are no larger than the JPEG gray scale "deliverable" image.

Color Space

Color Space defines the palette of colors used to create the color of each pixel in a digital image. For screen images, the RBG(red, blue, green) color space is used. With a 24-bit color depth utilizing an RBG color space, 2 to the power of 24 or more than 16 million unique colors are possible. Each of these colors is a result of combining the colors red, blue and green.

It is best to communicate only one color space to the end user to facilitate optimal rendering of all images across all platforms and devices. Since the majority of access to image files will be via a monitor using an RBG color space, using RBG as a default color space is recommended. Images in RBG will display reasonably well even on uncalibrated monitors.5

Alternately, printers use the CMYK (cyan, magenta, yellow, and black) color space model that is based on the absorbing quality of ink printed on paper.

File Compression

In order to serve digital image files over a network such as the World Wide Web, they must be compressed in terms of their file size so that acceptable download time for users can be achieved as well as acceptable file sizes for the given storage space. The lower the file size, the better. However, a bench mark of 200K is recommended as a file size limit for non-hi resolution "deliverable" images. The following file formats are listed and described in terms of their use.

GIF(Graphics Interchange Format): This compression format is only recommended for use in creating thumbnails and 1 bit bitonal(Black & White) images. For additional image types, JPEG provides superior compression.

JPEG(Joint Photographic Experts Group): Best of the compression file formats. JPEG is recommended for use in creating medium and hi resolution images for Web delivery.

PDF(Portable Document Format): This is a format created by the Adobe Software Corporation. It is a compressed standard for digital image, most of textual information. PDF files require users to have the Adobe Acrobat Reader software installed as a web browser helper application on their computer. This software is free and easy to install. The benefit with the PDF format is the users ability to re-size and print individual pages.

5.5 Best Practice for Image Capture and Formatting

In the production of "master" image files, the intent is to produce a high quality image that will serve as the source file for the production of present and future "deliverable" image files. The thought here is, the higher the resolution, the longer the half life of the "master" image in terms of it's usefulness. There is a degree of consensus to establish 600dpi as the preferred resolution level for capture of any document size and type. The justification for this is that 600dpi is sufficient to capture extremely small text legibly and can produce a high-quality publication at double life-size.2 The drawback here is that capture time is longer and file sizes become very large and difficult to handle without a well equipped PC. Due to this fact, 600dpi is recommended for KDL only where the required hardware, software and human resources are available or the material to be scanned requires 600dpi for the effective capture of fine detail.

Due to the wide range of material types to be scanned in archival collections and the feasibility of always capturing at 600dpi, it is difficult to assign one capture resolution to all document types and sizes. The KDL has relied on the experience of our imaging consultant and lessons learned in other projects to establish a range of recommended resolutions for "master" image files based on the size of the original material.

The recommended resolutions are specified by item size and listed in the Resolution Coverage Table below. This table has been modified from one devised by Howard Besser for the California Digital Library. Although this table will serve as the best approach for the majority of imaging jobs, it is always useful to do a test scan or two and check for quality, especially with pictorial material. If you scan at 300dpi or 400dpi by default, you may find that scanning pictorial material with a particularly fine level of detail is done more effectively at a higher resolution such as 600dpi.

The range of minimum, default, and high resolution settings is offered to allow individual institutions to find the resolution that works best for their local level of technological and human resources. Best efforts should be made to scan at the default resolution setting. Scanning at less than the minimum is not considered "archival", appropriate for long term sustainability.

5.6 Resolution Table for Master Images (8-bit grey scale and 24-bit color)

Long dimension of original (in inches) Minimum Resolution Setting (dpi) Default Resolution Setting (dpi) High Resolution Setting (dpi)
11.5 (digital camera or flat bed scanner) 300 400 600
15.5 (digital camera and/or large format flat bed scanner) 300 400 450
23 (digital camera) 300 300 300
35 (digital camera) 200 200 200
46 (digital camera) 150 150 150

A rule to follow in scanning items, is to check to make sure that the long pixel dimension is at least 3,000 pixels when scanned at the chosen resolution setting.

5.7 Image Sizing Table for Deliverable Images

Thumbnail Access Image Hi-Resolution Image
150-200 pixels across the long dimension 640x480
800x600
1024x768
1280x1024
1200 pixels across the long dimension
1000 - 5000 pixels across the long dimension

5.8 File Format Table for Master and Deliverable Images

Master Image Deliverable Thumbnail Deliverable Access Image Deliverable Hi-Resolution Image

uncompressed TIFF or TIFF with lossless compression ( Process that reduces the storage space needed for an image file without loss of data. If a digital image that has undergone lossless compression is decompressed, it will be identical to the digital image before it was compressed. Document images (i.e., in black and white, with a great deal of white space) undergoing lossless compression can often be reduced to one-tenth their original size; continuous-tone images under lossless compression can seldom be reduced to one-half or one-third their original size.)
JPEG,GIF JPEG, GIF JPEG, PDF


5.9 Media Formats

A variety of original material can be found in archival collections. Due to this fact, the first step in imaging or scanning an item, is to decide how to approach the object in relation to its physical format. The following material types are outlined with general guidelines for practice.

Textual Documents
The "master" image may be most appropriately captured in 1-bit bitonal, 8-bit grey scale or 24-bit color depending on the color characteristics of the original. The use of a photographic target is recommended. "Deliverable" images are compressed JPEG. An alternate "deliverable" sometimes utilized to save storage space is a 1-bit bitonal, web optimized GIF.

Pictorial Items
Among archival material, pictorial items such as photographs can present many challenges due to the fine level of detail often present in the original. It is useful to perform tests with photographs and other pictorial material. Try to find the finest detail in the image and see if this comes through effectively with current minimum or default dpi settings. It is also recommended that photographic targets be used in the scanning process for pictorial items.

Maps and Oversized Records
Use 8-bit grey scale/24-bit color depth. 300 dpi is sufficient for "master" image. These oversized items are currently best served as "deliverables" using Lizardtech's MrSID image format and server application(installed on the digilib.kyvl.org machine).

Graphic Records and Materials
These can include line drawings and artistic illustrations. Smaller graphic materials no greater than 8.5x11 in size should be scanned in the same manner as Pictorial Items. Larger graphic materials should be scanned in the same manner as Maps and Oversized Records.

5.10 "Master" Image Quality

Image quality is a complex issue. There are many subjective aspects to defining image quality when comparing a digital image to the original. This makes an absolute assurance method improbable. It is expected that as digital imaging technology progresses, enhanced software will provide methodology for precise image quality standards. However, there are standard quality control issues that can currently be addressed to establish a consistent and acceptable level of image quality.

  • Properly calibrate the computer monitor attached to scanning device. The monitor's manual should provide good information on this process. Also, the Denver Public Library's Western History Photography Collection Site provides an excellent online, step by step process to calibrate your monitor. http://photoswest.org/calib.htm
  • Properly calibrate scanning device using photographic targets and device manuals.
  • Select appropriate color depth based on color characteristics of the original.
  • Select appropriate resolution based on size of the original as well as the level of fine detail apparent.
    Perform test scans to assure that your "master" image files are captured properly.
  • Check for blur, moir?atterns, and color characteristics compared to the original. When scanning photographs with fine
    • Moir? Patterns: Striped or checkered patterns appearing in a scanned halftone (Printing process by which images are rendered by hundreds of tiny dots. Halftone images are commonly found in newspapers.) image. Your scanner should allow for correction at the time of scanning through the "descreening" function. Retrospective fixes via image editing software do not work well.
    • Deskewing: Scanned item is not aligned correctly. It is best to align the scanned item before scanning. In some cases this may not be possible and the image editing software will need to be used to align the captured image.
    • Cropping: The "master" image should capture to the edges of the original material. Cropping is used to cut out any white space around the scanned document.details.
  • Also check the test scans for proper resolution setting. The quality of this original image has a major impact on the quality of the "deliverable" images derived from it.
  • Save "master" images with lossless compression format (TIFF).
    Additionally, the following are important aspects to consider when producing "master" images.

5.11 "Deliverable" Image Quality

Assuming that the "master" image was captured properly, the quality of the derived "deliverable" images is effected by resolution, size, and compression. The key tradeoff in defining an appropriate level of image quality for "deliverable" images is the balancing of compressed file size and resulting storage requirements with image quality.3

The following are important aspects to consider when producing "deliverable" images.

Artifacts
Visual effects introduced into a digital image as a result of image compression. These are seen as blurry, wavy lines molded around details in the image. The higher the compression level, the greater the artifactual effects.

Compression Level:
Image editing software allows for the setting the level of compression. This is usually set on a scale from 1 to 10. For deliverable images, medium compression levels 4, 5, or 6 are recommended.

Sharpening:
Producing "deliverable" images from a "master" image involves re-sizing the image to an appropriate scale for screen presentation. A side effect is a blurry image. Using the sharpening function available in many image editing software applications such as Photoshop can compensate for the effect and produce more usable images. The following is a general set of rules to follow in using the unsharp mask function.

Amount: 100-200%

Radius: 1 to 2 pixels

Threshold: 2 to 8 levels

Levels of Access: For appropriate access to digital images, more than one version of the image may need to be offered. The following are recommended for the Kentuckiana Digital Library.

Thumbnail Copy (contained in a database record, finding aid container list, etc.)

Access Copy (medium resolution, fits onto computer screen)

Optional Hi-Resolution Copy (enhanced access to level of detail)

Optional PDF Copy (allow users to print a copy)

5.12 Administrative Image Metadata

The following information should be saved as an ASCII file and stored with the "master" image files it describes.

How? bit depth, color space, dpi

When? ISO 8601 [W3CDTF] follows the YYYY-MM-DD

Who?

5.13 Assigning File Names


It is important for the central site to have a consistent method for naming files so that such a method can be specified in outsourcing contracts and so that over time file names across institutions will not conflict and can be managed within the context of batch processing for migration to future formats. The question of file naming conventions was addressed by all three of our consultants. Unfortunately, there are no set standards for this. The best advice from our consultants in terms of strategy was "be consistent" and preferably include a unique identifier in the file name that relates to the source. With this in mind, the Kentuckiana Digital Library specifies the following file naming implementation for individual institutions to adopt when creating files for the Kentuckiana Digital Library.

In relation to naming files, the term "handle" is often used to describe a unique identifier for a specific resource. Generally, a handle is comprised of two main parts, the first being a naming authority name and the second being a string unique to that naming authority.8 In the context of KYVL, the naming authority part of the handle should specify a particular KYVL institution such as Morehead State University. Instead of specifying the naming authority in full, an abbreviated reference is used. This reference is based on a global naming authority, for our purpose, the institution's OCLC Institution Code. This part of the handle will be specified by the centralized file management structure as in the following example and referred to as the GLOBAL ID.

Deliverable image files held by the central site for Kentucky State University will live under: http://digilib.kcvl.org/images/kys/

The second half of the handle is then supplied through referencing a unique string within the specified naming authority. This is based on an accession, control or other unique numbering system used by the specified naming authority and is referred to as the ITEM ID.

A third part of of the handle involves structural metadata to specify the order in which a group of related digital files are existing. This is given via sequential numbering of the digital archival objects and can be specified within an archival collection from start to finish, or within specific groups of files such as individual page images for a manuscript. For specific resources, alternate structural metadata may also be required.

5.12 File Naming Implementation

  • All directories and filenames will be in lower case.
  • Filenames will be assigned following these conventions:
    • Within a document, you must increment up 1 number for each sequential page [no random numbers or names]
    • Recommended Basic Structure Naming File Directories and Digital Archival Object Files
      unique institution code(OCLC Institution Code) + / + accession # OR
      unique institution code(OCLC Institution Code) + / + accession # + / + item # in sequence
    • Digital Archival Objects with EAD
      OCLC Institution Code + / + accession # + / + item # in sequence
    • Example: The second photograph in the Doris Ulmann Photograph Collection (96PA104) where there are 154 frames total: kuk/96pa104/002.ext
    • Example: Third slide of 506 in the "Turtles" series of Barbour(Roger W.) Photograph Collection where there are a total of 8 series with "Turtles" being the 2nd, and a total of 1389 slides in the entire collection: kmm/92bp/0506.ext
    • Note: When producing digital image files, alternate versions of the image files for thumbnails, pdf and hi-resolution should be given the same file name but placed into the following sub-directories.
      thumbs
      hi-res
      pdf
    • EAD Finding Aids: OCLC Institution Code + / + accession#

5.13 OCLC Institution Codes for KYVL Institutions

These codes are used for institution naming authority within our file naming structure.

Ashland Community College: KUA
Boone County Public Library: KYB
Centre College: KCC
Eastern Kentucky University: KEU
Filson Club Historical Society: KTN
Georgetown College: KGG
Kentucky Department for Libraries and Archives: KSL
Kentucky Historical Society: KNU
Kentucky State University KSU
Lexington Community College: KUT
Lexington Public Library: KYL
Louisville Free Public Library: KLP
Morehead State University: KMM
Northern Kentucky University: KHN
Southeast Community College: KUS
Transylvania University: KTU
Union College: KUC
University of Kentucky: KUK
University of Louisville: KLG
Western Kentucky University: KNV

5.14 Safe Handling of Archival Material


An important part of the digital imaging process that cannot be overlooked, is the proper approach to handling archival material in relation to its digital capture. The following minimum guidelines are recommended to preserve the integrity of the source material used during the digital image capture process. More complete guidelines for study are available from the Library of Congress's in-house course handouts entitled 'Criteria for Selecting Items for Conservation Treatment before Digital Scanning' and 'Care and Handling of Library Materials for Digital Scanning'.

Minimum Guidelines

  • No food or drink in work space.
  • Wash your hands before handling materials.
  • Clear work area before working with originals.
  • Keep sharp items and pens/markers away from original material.
  • When original material is not being scanned, it should be covered and stored in a secure place.
  • Wear cotton gloves to prevent transfer of skin oils to original material.
  • Pages of a book should be turned carefully. Conservators recommend lifting the upper corner of the page, then using one's whole hand to support the page as it's turned.
  • For books, do not open at an angle greater than 120 degrees. This entails the use of an overhead scanner with book cradle or a book edge scanner.
  • Flat paper items that are oversize should be placed between two rigid boards to be flipped over. Turning a large item over should never be attempted without additional support. This operation often requires two people.
  • To pick up a single sheet, paper original, use a corner of paper inserted under the edge.
  • Keep materials in order to minimize handling.
  • Do not flex item when turning it over.
  • Unfold items carefully. Do not unfold items unless they have been identified as non-brittle.
  • Brittle materials may need a polyester support such as Mylar in order to be handled safely and scanned at all.
  • Repair of damaged originals should be done before digital capture by a conservator or other trained professional.

NOTE: For additional reading, visit the Cornell Online Tutorial "Digital Imaging: Moving Theory into Practice" online at:: http://www.library.cornell.edu/preservation/tutorial/contents.html

Last Update: May 2004



Contact Us | About | Guidelines | Technical Aspects | Kentucky Virtual Library


To comment or inquire about content, contact kdl-help@athena.uky.edu
To report errors, contact kdl-help@athena.uky.edu

Copyright & Use Statement
Copyright 1999-2009 Kentucky Virtual Library