The pose slide and artwork collections were batch loaded into the Knowledge Bank (a DSpace repository). Our routine process for batch loading involves creating a spreadsheet (.csv) containing the metadata and filename for each item. A stand-alone Java tool (SAFBuilder) transforms the metadata contained in the spreadsheet into dublin_core.xml files and builds the simple archive format directory (metadata + content files) required for the DSpace item importer.
Working with Beth Kattelman, Curator of Theatre, I designed the Qualified Dublin Core (QDC) metadata for the Knowledge Bank image collections. Knowing that TRI had an item level EAD finding aid for the Joel E. Rubin Collection, I certainly wanted to reuse the descriptive metadata available in the finding aid when building the batch load spreadsheets. One possibility for reusing the metadata would be to have a staff member or student assistant manually copy and paste the metadata into the Knowledge Bank spreadsheet by following a mapping of the finding aid information and the spreadsheet columns (QDC fields). That approach, however, would be very time-consuming and inefficient. Instead, we used a workflow I developed for the automated reuse of the finding aid metadata.
In order to take advantage of our routine batch loading process, the finding aid metadata workflow transforms the EAD metadata from the finding aid and outputs it into a .csv file for use with the simple archive format packager SAFBuilder. Two different methods were developed and both work well: an XSLT workflow, and a workflow using xml2csv from A7Soft.
XSLT workflow: An XSLT stylesheet written for the project transforms the EAD XML into a CSV with the desired data using an oXygen transformation scenario. Each item is extracted from the EAD document as a row in a CSV file. The EAD metadata is then mapped to Dublin Core.
xml2csv workflow: The A7Soft command line utility for converting XML files to CSV is used to convert all fields specified in the field.txt file from XML to CSV. The fields are then mapped to Dublin Core.
>xml2csv OU-TR0012.xml TRI-JOR.csv field.txt -q -c=UTF-8
The preparation of the pose slide batch load also used the embedded image metadata workflow I developed. As requested during the initial metadata conversations for the digital reformatting project, the Preservation and Reformatting Department added the titles found on the physical slides to the embedded image metadata as part of their digitization workflow. This embedded metadata was extracted to a CSV with ExifTool and used as a quality control check in the batch preparation process.
To make connections between the artwork for the slides, and the slides themselves, which are in separate Knowledge Bank collections, I also added relation fields to the item records to link pose slides to artwork, and artwork to pose slides. The pose slides were batch loaded first which allowed links to the slides to be added to artwork page items as part of their batch load preparation. (Note: TRI does not hold all slides represented in the artwork scrapbook, and not all slides held are represented in the scrapbook.) After the artwork collection was archived, I used the batch metadata editing feature of DSpace to add related links to the pose slide for corresponding artwork. For an example of the linking, see: http://hdl.handle.net/1811/59163 and http://hdl.handle.net/1811/56707.
These batch loading workflows save re-keying while taking advantage of the metadata created by our partners to more efficiently add new content to our institutional repository.