| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Federation Models

This version was saved 15 years, 4 months ago View current version     Page history
Saved by Lisa Spiro
on November 21, 2008 at 6:13:50 am
 

8. Possible Approaches to Federating Archival Description from Multiple Repositories

 

Currently researchers face many challenges in identifying and gaining access to archival holdings distributed at archives and special collections across the United States.  Many archives have not described all of their collections or made that information available online.  Even if archival description is online, researchers have to look in several places to find relevant resources, searching MARC records in WorldCat, MARC and EAD records in ArchiveGrid, National Union Catalog of Manuscript Collections (NUCMC) records in Archives USA, EAD finding aids aggregated in regional repositories such as Online Archive of California and Texas Archival Resources Online (TARO), and/or finding aids provided through the web sites of particular archives.  In order to facilitate discovery of archival resources, the CLIR Hidden Collections program aims to provide a federated catalog drawing from multiple repositories.  As the program description states, “The records and descriptions obtained through this effort will be accessible through the Internet and the Web, enabling the federation of disparate, local cataloging entries with tools to aggregate this information by topic and theme.”[1] 

 

Archivists whom I interviewed recognize the value of aggregating information from multiple repositories.  As one argues, “We just have to federate—there really isn’t a reason to stop at the stage of putting things on the web.  The point of EAD was not to put finding aids online, but to share, to get everyone together, to do things across a collection.  If we don’t make the step forward to sharing, we might as well be using HTML.”

 

However, federating archival descriptions poses some significant challenges.  For one thing, an appropriate technical infrastructure needs to be developed, perhaps leveraging OAI-PMH or RDF.  A federated catalog needs to be flexible enough to accommodate the diverse data generated by archives yet rigorous enough to present data in a standard format.  Options for federating archival data include:

 

1.    Make MARC & EAD available through a national/international service such as ArchiveGrid, Archives USA or Archives Hub. 

OCLC’s ArchiveGrid[2] includes archival information from thousands of archives in the US, the UK, Germany, Australia and other countries.  Archive Grid draws from two main data streams: archival records in WorldCat (about 90% of the total records) and finding aids harvested from contributing institutions.[3]  These finding aids can be written in EAD, HTML, or plain text.  To set up the harvesting, OCLC asks contributors to point to a web site of finding aids that can be crawled.  The crawler brings over the text of the finding aid, parses it so that it maps to the ArchiveGrid’s record structure, and adds it to the index.  For harvested finding aids, ArchiveGrid links from its search results to the full finding aid on the contributor's web site, similar to a Google result.  Currently thematic collections aren’t represented, ArchiveGrid does not yet have consistent topical categories to apply across its varied contributions, but that could change. Archives pay nothing to contribute records to ArchiveGrid, but access to the full records in Archive Grid is available only through a subscription.  However, through OpenWorldCat, researchers can get access to a large subset of archive’s MARC records that are also available through ArchiveGrid.  It’s possible that an archival version of the freely-available OpenWorldCat—Open ArchiveGrid?—could be developed so that a subscription would not be required.  One archivist reported satisfaction with Archive Grid: “Archive Grid is harvesting our EAD files… It seems to be gathering those OK.”

 

Another aggregation model is provided by Archives Hub, the UK’s “national gateway to descriptions of archives in UK universities and colleges.”[4] Supported by Mimas, “a JISC and ESRC-supported national data centre” for higher education,[5] Archives Hub offers a distributed model for aggregating content from individual archives.  Archives can become “spokes,” enabling them to retain control over their data and provide a custom search interface to their collections while also making their content available through a common interface.[6]  Archives Hub is built on the Cheshire full-text information retrieval system, which includes a Z39.50 server.  Archives Hub focuses on higher education institutions in the UK, but will accept contributions from other relevant repositories. (Nevertheless, it’s probably more appropriate as a model than as a repository for US finding aids.)

 

ProQuest’s Archives USA “is a current directory of over 5,500 repositories and more than 161,000 collections of primary source material across the United States.”[7]  It provides online access to the National Union Catalog of Manuscript Collections (NUCMC) from 1959 to the present, names and subject indexes from the National Inventory of Documentary Sources in the United States (NIDS), and collection descriptions contributed by archives. Like ArchiveGrid, Archives USA allows repositories to contribute finding aids for free, but requires a subscription to access.

 

2.    Harvest EAD from distributed repositories through OAI-PMH, Atom, or another technology

Existing technologies such as OAI-PMH[8] and Atom[9] support harvesting and aggregating content from distributed repositories. The University of Illinois-Urbana Champaign (UIUC) has already developed preliminary OAI services and tools to harvest information from EAD and other sources.[10]  As UIUC found, converting EAD to OAI-PMH poses several challenges: mapping a single EAD file to multiple OAI records; the variability of EAD-encoding practices; the complex hierarchical structure of EAD finding aids; and contextualizing individual results within the overall hierarchy.[11]  Illinois experimented with “a schema that produces many DC [Dublin Core] metadata records from a single EAD file,” producing a collection-level record that linked to the EAD finding aid as well as providing links to related parts of the collection.[12] Archon is now experimenting with harvesting finding aids from a static directory via OAI-PMH, but nothing has been released yet.  Other archival management systems, including CALM for Archives, MINISIS M2A, and Adlib Archive, already provide support for OAI.  The Florida Center for Library Automation is also exploring using the OAI-PMH protocol to harvest EAD from registered provider sites.[13]  While Kathy Wisser was at the NC Echo project, she developed a proof-of-concept distributed repository using the Internet Archive’s Heretrix web crawler and XTF as the indexer.

 

3. Adopt an archival management tool that supports federation

ICA-AToM is being designed to support harvesting and syndication via OAI and IETF Atom Publishing Protocol (APP).  According to its web site, “it can be set up as a multi-repository ‘union list’ accepting descriptions from any number of contributing institutions.”  Perhaps a tool such as ICA-AToM could be adopted to provide a union list, although such a solution may not be flexible enough to accommodate the varied methods archives use to deliver archival information.

 


[1] “Cataloging Hidden Special Collections and Archives: Building a New Research Environment (2008).”

[3] Bruce Washburn, interview

[6] Archives Hub, “Archives Hub.”

[11] Prom and Habing, “Using the Open Archives Initiative protocols with EAD.”

[12] Cole et al., “Now That We've Found the 'Hidden Web' What Can We Do With It?.”

[13] Florida Center for Library Automation, Sustaining & Growing The Opening Archives In Florida Project:  Report of Ad Hoc Project Advisory Group Meeting.

Comments (0)

You don't have permission to comment on this page.