Access to Art Collections Using Encoded Archival Description and Beyond: The Future of Large-scale Consortia Projects

By Richard Rinehart
spacer gif

Abstract

In a Herculean effort to distribute information about art collections on a previously unknown scale, museums, arts organizations, libraries, and archives have been hard at work developing standards and implementing testbed projects, large-scale union databases, which integrate and disseminate information. Two such projects include 'Conceptual and Intermedia Online' and 'Museums and the Online Archive of California' - both using the Encoded Archival Description to describe and provide access to art and other cultural collections. But what is the future of such collaborations and the content portals they spawn? Will they be able to scale up to include hundreds or thousands of institutions using current models? What are the limitations for such consortia? What are the limitations for participating institutions? Several options appear on the horizon, and one simple need suggests looking to decentralization, and back to the individual institution, for the solution to sharing art and cultural content on a truly vast scale.
spacer gif

What is the EAD?

The Encoded Archival Description (EAD) (1) is a collaborative, community-driven standard, maintained by the Network Development and MARC Standards Office of the Library of Congress in partnership with the Society of American Archivists. The EAD is an XML DTD (a specific set of XML tags and rules) to 'markup' or electronically encode finding aids, inventories, and guides to collections of primary artifacts. The end result is one text file, which includes the actual guide to a collection, including any number of individual object records belonging to that collection, and the EAD tags. The guide is structured hierarchically, so that at the top you identify and describe the entire collection, then as you move down, you describe sub-groups of objects, and list individual object records. At any level in the document, but probably at the level of the object record, one can include EAD tags to link or display multimedia files such as a thumbnail image of the object.
spacer gif

Below is an example of one object record (an artist book), exerpted from a collection guide, shown with EAD markup
spacer gif
<c01 id="bampfa" level="item">
       <did>
       <daogrp>
       <daoloc href="http://www.bampfa.berkeley.edu/moa2/servlet/archobj?DOCCHOICE=1992.4.91.moa2.xml" role="hi-res"></daoloc>
<daoloc href="http://www.bampfa.berkeley.edu/docs/images/bampfa_1992.4.91_136_3.jpg" role="thumbnail"></daoloc>
</daogrp>
<origination>
<persname> Theresa Hak Kyung Cha </persname>
</origination>
<unittitle> Pomegranate Offering </unittitle>
<unitdate> 1975 </unitdate>
<repository> The Berkeley Art Museum / Pacific Film Archive </repository>
<unitid> 1992.4.485 </unitid>
<admininfo> Gift of the Theresa Hak Kyung Cha Memorial Foundation</admininfo>
</did>
</c01>

spacer gif

Below is an example of how the above markup appears when presented online in the context of the collection guide

The main benefit of testing the EAD for museum use is to determine if museums can encode information about their collections in EAD format (among other standardized formats), that allows museums to immediately share collections information in cross-community, integrated systems that include archival and library resources, such as the Online Archive of California, Research Libraries Group, Library of Congress and many others. The other potential benefit of EAD is that museums can describe not only individual items in their collections with as much detail as they want, but can also describe the context of an entire collection, such as the biography of the artist or historical period that joins a group of objects. These relations are sometimes implicit in standard collections databases by the presence of keywords across several object records which link them for searching purposes, but the EAD allows museums to present that context in an explicit manner, relating objects in human language for the user to facilitate not just retrieval, but understanding.
spacer gif

CIAO

In 1997, four museums experimenting with EAD for describing conceptual art obtained NEA funding for the project and created CIAO (Conceptual and Intermedia Arts Online)(2). CIAO has grown into an international collaborative project between several organizations to create networked access to educational and scholarly material on the broad theme of conceptual and intermedia art, including new works of digital art. Current partners include:

Conceptual art, while a subject of central concern to contemporary art studies, presents problems of access that impede its use by scholars and students, and its exposure to the general public. While there may be literature on and exhibitions of conceptual art, collections access is impeded by the ephemeral, documentary, and multi-part, mixed-media nature of many conceptual art works. The works often challenge traditional methods of art description and cataloging. Works of this nature require a context in order to be understood; they require complex relationships between objects and groups of objects to be made explicit in both human terms and in machine formats for purposes of navigation. Linking the "objects" with the "archives" is also crucial in that it’s often unclear where one stops and the other begins in conceptual art works, and both contribute to a fuller understanding.

CIAO is exploring the hierarchical context-capturing aspect of the EAD, among other relevant issues, to describe groups of objects together, and to link object and documentation records, to facilitate deeper understanding of these complex collections.
spacer gif

MOAC

In 1998 several museums began participating in the Online Archive of California and developed MOAC (Museums and the Online Archive of California)(3). MOAC is a project of the Online Archive of California(4) managed by the California Digital Library. MOAC is being created to facilitate the integration of museum collections information into the statewide OAC, where it will be accessible alongside collections information from libraries, special collections, and archives. Partners in the initial development include:

MOAC is exploring issues relevant to providing access to object and image data within the EAD on behalf of the OAC. MOAC is also investigating how to best integrate access to collections information from diverse types of museums (art, photography, anthropology, cultural, history) with an even larger set of institutions (archives, historical societies, and libraries). It is hoped that providing access to material culture held throughout California will prove valuable for teachers, students, and researchers in that state and elsewhere.

In addition to the workflow, collaboration, economic, and technical issues being explored, this project is testing the cross-community application and flexibility of the EAD. In this area, CIAO and MOAC are addressing the important differences that exist in the descriptive practices of museums, libraries, and archives - practices which are embodied in standards such as the EAD. So far it appears that such descriptive differences account for minimal differences in terms of creating meaning around through encoding. For instance, whether encoding a record for an art object in a museum or a document in an archive, the originator of each item is an equivalent, and equally core, concept. However, it may be useful in the end to be explicit to the user about such assumptions as descriptive practice, and let the user know which are the provenance-based collections and which the 'artificially' organized or thematic collections for instance. Where descriptive practice differs enough, it may also be warranted to allow different communities, such as museums and archives, to employ slightly different but clear applications of the same standard, such as the EAD, allowing each to describe it's collections in the most integral way, while still allowing for the interoperability that one standard affords. For instance, in the Online Archive of California, museums may use the EAD to include more item-level detail and markup about museum objects within a collection, while archives may describe collections at a more general level, while both share a common core of information and encoding which allows for database-wide retrieval. Differing community standards for descriptive practice cannot be brushed aside, but neither are they necessarily insurmountable obstacles to cross-community integration.
spacer gif

Images and Complex Multimedia Objects

One interesting area of investigation in MOAC and CIAO is the multimedia representation of non-traditional or complex objects in a standardized online environment. Many museum objects are fairly simple objects such as the flat, delimited plane of an easel painting. These objects require a fairly simple form of visual documentation - usually one image, at multiple resolutions, will do the trick. In an online environment, these are deployed as one thumbnail image, which appears alongside one object record, creating an easy 1-to-1 relationship. The user can click on the thumbnail and access a higher resolution of the image. However, many museum, library, and archival collections contain works, which are not sufficiently represented by the 1-to-1 model. These complex or compound objects require more than one image to be viewed in any detail. This small change radically alters the methods needed for managing and navigating these images online.

For instance, when one object record describes an artist book, one image would only depict the cover of the book - a teaser not useful for research even at high resolution. Imaging the entire book is much more desirable; but even when that is accomplished, how does one relate the individual page images and present them on-screen in a standardized way so that they can be navigated by the user effectively? In another instance, many Asian scroll paintings have long, thin proportions which are not compatible with the "thumbnail" rectangle of most collections access systems. To picture the entire scroll in one thumbnail image would present a thin light bar across the center of the image, leaving out so much detail as to make them all look the same. When accessing a larger version of this work, one image of the entire scroll at a sufficiently high resolution would be far larger than most bandwidth currently allows. Instead, one needs a way of presenting several high-resolution sections of the scroll which can be 'stitched' together on the fly for viewing as a complete whole. Site-installation artworks often exist in environments, which cannot be accurately represented by one image. Angle and side views of sculpture also require many images to be somehow related to one object record and navigated in a way that makes sense.

To represent these latter works online, the museum needs not only a couple of resolutions of one image, but now also needs to manage several resolutions of several images - all representing one work in the collection. The museum needs metadata describing the relationship of one image to the next, and the structure of the complex object overall, so that it can be navigated - for instance, so that one can view an artist book in page-order, and even jump through the book to an exact page or chapter. In short, the museum needs a metadata system for describing digital representations which is nearly as detailed and flexible as the system for describing the collection itself.

While MOAC and CIAO use the EAD as an international standard for describing the structure and detail of a collection (the actual physical collection) - MOAC has adopted the MOAII standard for describing the structure and detail of complex objects (and their digital representations) within a collection. MOAII is also an XML DTD, which describes in similar hierarchical fashion the structure and digital image metadata associated with any one object or work. The end result is a sort of mirror of the EAD; one text file which includes the structure of a complex object, including any number of individual images belonging to that object, and the MOAII tags. The MOAII XML document is structured hierarchically, so that at the top you identify and describe the entire object (say an artist book), then as you move down, you describe sub-groups of images (say those belonging to the same chapter), and list individual images (say page images). MOAII shares some of the benefits of being a standard in the making. Several of the institutions which developed the MOAII DTD; the UC Berkeley Library, Cornell University, NY Public Library and others are deploying the standard as well and creating a community of users which can leverage resources, experiences, and content.

MOAII allows software, such as a server application or web browser, to act on the MOAII XML document to display the object. The Berkeley Art Museum has used MOAII to present 60 artist books from the Theresa Cha Conceptual Art Collection online as part of the OAC(5). These 60 works include hundreds of page images and transcriptions. MOAII is metadata - it allows MOAC to manage and display digital images in groups and sequences, but it does not spell out the requirements for the digital images themselves. Those technical specifications for images were agreed upon by the consortia, and are documented in the "MOAC Technical Specifications" which defines standards for individual image files, as well as the other technical requirements for contributing content to MOAC.

At this point in time, MOAII deals well with 'book-like objects' such as diaries, albums, and of course artist books. The Berkeley Art Museum will be testing MOAII on a collection of Asian scroll paintings to see if it can be successfully adapted or extended. MOAII appears a promising direction to pursue for managing and presenting complex objects comprised of discrete, static images such as books, rooms, sculptures, and scrolls, but what about time-based media such as digital representations of film, video, and audio collections?

An intriguing direction to explore for time-based media would be the Synchronized Multimedia Integration Language (SMIL) (6). SMIL is a W3C standard for describing the structure and allowing navigation of complex multimedia files - acting on time-based media in much the same way that MOAII acts for static images. The SMIL standard is also XML-based, and allows one to position multimedia files for display, carefully time the playback of multiple audio and video streams at once, and - again like MOAII - include transcriptions for video or audio. Lastly, like MOAII, SMIL is a metadata standard, which could greatly aid in the management and presentation of multimedia files, but would not dictate the format or standards for the actual multimedia files. So, again consortia will be required to continue developing or adopting some 'content standards' for implementation of SMIL, and specifications with relation to the actual media files as well.
spacer gif

Tools and a New Direction

Addressing issues of long-term scalability in consortia like MOAC or CIAO has given form to some concrete needs and has suggested possible new directions for future thinking about large-scale consortia projects in general. Currently in the arts and museum community, organizations have a few choices about how they might make available their collections information and images to researchers and the public in digital form.

The first method is to provide such information locally, say on the individual art museum's website. In this case, the organization can independently decide on methods, tools, and to a large extent the format, standards, and content of their information. This is an important method, since it allows organizations to tailor such information to their specific audiences. However, on a community, state, or national level, this method does not scale well because researchers are burdened with finding each individual organization website, repeating multiple searches, and collating results manually which may be in disparate formats.

The second method is for organizations to join together in consortia projects to pool their collections information into one resource or content portal. They must agree to submit information into this pool in some standardized format where the user can then use it in convenient, aggregate form. This content portal can be a centralized online database (such as the Online Archive of California) or can be distributed among different 'mirror' delivery sites (such as the Art Museum Image Consortium). In recent years several arts and museum consortia projects have successfully developed just such content portals to aggregated art collections information. These include "Museums and the Online Archive of California", "Art Museum Image Consortium"(7), "CIMI Dublin Core Testbed"(8), "Research Libraries Group Cultural Materials(9)" and others.

It is no coincidence that for the most part, each of these consortial content portals contains a unique set of content and unique list of member institutions. One reason for this is that the cost to an individual arts organization in time and resources to participate in any one of these consortia usually reaches that organization's limit. There are several reasons that this cost of participation is currently so high. First is that each consortia often adopts a different set of standards than the others. Each standard is valid, but it means that consortia member organizations are burdened with formatting their collections information into the agreed upon format, and cannot readily leverage that work into other formats. Currently it is very labor-intensive to convert even one institution's collections information from the local format into a standardized format because few simple, cheap tools exist to automate or assist this often highly technical process. Another reason for high-cost of participation is that many consortia have had to create or refine these standards at the same time as they built their content portal, requiring a high-level of collaboration and time commitment from partners. It is unlikely that all consortia will converge onto one "Holy Grail" standard soon, and perhaps they need not. Perhaps a limited set of different standards actually allows each resource to use the content and add value in different ways. In any case, either agreement on one universal standard, or a practical method for allowing all content portals to be dynamically linked are distant prospects.

This high-cost of participation limits the size, comprehensiveness, and usefulness of each content portal. The high-cost limits the choices of arts organizations (especially smaller, resource-scarce ones) to disseminate their collections information into as many venues as they might want. The current situation also restricts the potential benefits of standards for leveraging access to collections information to within each content portal instead of across them. Lastly, the situation for the user of arts collections resources has been vastly improved by such consortia efforts, but the user's efforts are still Balkanized into a series of discrete resources with unique content.

Rather than propose that each content portal become larger and encompass more (something which will surely happen, but which is at the same time limited and slowed by the aforementioned high-cost of participation), and rather than invent yet another content portal, experience suggests a new approach; one which is practical and institution-centric. What the arts community needs at this time is a set of tools and guidelines which would enable even small arts organizations to convert their collections information and images to not just one standard format, but several standardized formats so that they could significantly reduce the cost of participation and share their collections in several content portals in a practical manner.

Several of the larger, more successful content portals have well-defined, standards-based specifications for submitting data in digital form, thus allowing us at this time to develop tools which can format collections information into those targeted forms.

The Berkeley Art Museum has already developed a prototype of such a tool(10), developed in FileMaker Pro, and it is starting to be used in the Museums and the Online Archive of California project where it has enabled the Berkeley Art Museum, California Museum of Photography, Japanese American National Museum, and Grunwald Center for Graphic arts to export collections information into the standardized EAD XML format as well as image data into MOAII format for contribution into the Online Archive of California. The UC Berkeley Library has developed a similar tool, developed in MS Access, which is also being adopted by other libraries to export collections and image data into standardized formats. Such tools, if designed to be relatively easy to use, can assist an institution to participate in multiple content-portals by automating much of the data conversion, thus reducing the level of technical and standards-expertise needed at the source institution.

Some very practical logistical questions have already come up with regards to these tools enabling the decentralized and increased sharing of cultural content. For instance, how would such tools be developed? Maintained? Distributed? Supported? It seems appropriate that such tools might be developed outside the context of any one content portal project - or at least between many of them. The content portal is appropriately and necessarily concerned with making it easier for institutions to contribute content to that specific portal, and perhaps philosophically amenable to tools which enable sharing content with other portals - but perhaps not compelled to spend serious resources on enabling this outside functionality. Consortia who develop and maintain content portals, however, are necessary partners in this endeavor; as the tools would of course be intended to allow institutions to interface with just these content portals. One way to think about this relation is that each content portal is, by necessity, 'mono-lingual', deciding on a set of standards to which contributed content must conform. Institutions, in order to participate in multiple content portals, and tools must be 'multi-lingual'. Who better to help train the institutions and tools in each particular language though, then the consortia which deploys them?

So, development of such tools could take two simultaneous routes. First is an 'open-source' methodology; where a smaller core of institutions develop a modular toolset and make it freely available for use and modification. Others who that modify the source tool, adding modules that allow export into new standards formats for instance, can upload the revised version back into the public space. Others may integrate these versions for their own use, and make the integrated versions available, etc. The synergy and distributed workload of an open-source system has a definite appeal. Professional organizations (or just newsgroups and anonymous FTP sites) could provide the simple mechanisms for sharing information and versions as they do for other open-source projects. Another track; which could be taken simultaneously, is to allow the commercial, vendor community to develop and sell such tools - perhaps these multi-lingual conversion tools would be bundled with collection management systems. Vendor developed systems would have the benefit of ongoing professional development, documentation, and support. So, for free, a museum might benefit from the "shareware" version, but for a fee they could get documentation, support, and extra features. The vendor community would of course benefit from developments in the open-source versions as well.

It is also becoming apparent that it is beyond the scope of any one content portal to conduct ongoing training and support necessary to enable an increasingly large number of institutions to participate in that portal. The organizations that by definition should, and are, beginning to take up the question of community-wide professional training are the professional organizations - ARLIS(11), MCN(12), SAA(13), ALA(14), AAM(15), MDA(16), etc. Another component that these organizations could develop together is a means of measuring institutional readiness - a metric of an institution's ability to launch into a technically complex content portal project. A self-administered questionnaire comes to mind; but is hardly the only option available. Content portals can benefit greatly from this metric; requiring a specific level of readiness for participation, thus requiring less hand-holding and institution-specific follow up - another tactic which does not scale well.

While it was certainly unexpected that trying to meet the simple needs of one consortium project by developing software tools would suggest a possible shift in the way institutions relate to consortia projects in general - it is exactly the kind of potentially high-impact conclusion that arises from actually sinking our collective shovels into a real-world project and uncovering monuments buried in the sand.
spacer gif

Conclusion

It seems apparent even at this early point in our collective development of standards and standards-based testbeds that there is no one silver bullet; no one content portal that is appropriate for every institution to be part of; no one standard which will allow all possible uses of art collections information; and not one tool that, even if multilingual, will fit every community or institutional environment. This has been true in theory all along - and there are a few projects that are attempting to address this; most notably standards-crosswalks (17) and best practices documentation projects (18). However in the real world, competition for resources and the aforementioned restriction on time has usually meant that each institution and many professionals have been able to participate in the growth of only one or two large scale projects or standards. Institutions then become identified with these specific standards or consortia, and develop a loyalty. This is natural, as is the occasional fit of competition, and has allowed individual projects to flourish, but at the same time has a slight chilling effect on efforts that would allow each institution to wander from project to project, sowing content like the seeds in different gardens, ultimately providing greater benefit to everyone. There is also often an implied resistance to the idea of 'redundancy' which of course springs from current technology's roots in office automation, which may inhibit our desire to distribute content into more than one place, suggesting in fact that there is a right place for everything, including museum content and art collections information. However, in order to be valuable in their own right, different consortia projects need only be different enough to add a significant and unique value to the content; to reach a different audience, add new functionality, to present a new curatorial filter, or test a new business model. Some amount of redundancy coupled with massive decentralization is proven methods for building stable, large-scale systems - take the U.S. Interstate system or the Internet itself as examples. For the time being; having arts and cultural content in as many places as is appropriate benefits both the institutions and the end-users.

The idea to develop multi-lingual tools able to speak in several standards, thus allowing this broad sharing of content, is just one early conclusion that may be derived from consortia such as MOAC and CIAO. Perhaps this idea can take its place in the current development toward highly de-centralized models of content sharing. This additional step might enable greatly increased scalability of content resources for the end-user, and allow institutions to choose where to deploy content based on their missions and not their technical limitations.
spacer gif

References

  1. EAD, http://www.loc.gov/ead
  2. CIAO, http://www.bampfa.berkeley.edu/ciao
  3. MOAC, http://www.bampfa.berkeley.edu/moac
  4. OAC, http://www.oac.cdlib.org
  5. Cha Collection http://www.oac.cdlib.org:80/dynaweb/ead/moac/cha/@Generic__BookView;cs=default;ts=default
  6. SMIL, http://www.w3.org/AudioVideo/
  7. AMICO, http://www.amico.org
  8. CIMI, Dublin Core Project, http://www.cimi.org/public_docs/meta_final_proj_desc.html
  9. RLG, Cultural Materials Initiative, http://www.rlg.org/culturalres/index.html
  10. "Produce, Publish and Preserve: A Holistic Approach to Digital Assets Management", http://www.bampfa.berkeley.edu/moac/imaging/index.html
  11. Art Libraries Society of North America, http://www.arlisna.org/ - Art Libraries Society UK and Ireland, http://arlis.nal.vam.ac.uk/
  12. Museum Computer Network, http://www.mcn.edu
  13. Society of American Archivists, http://www.archivists.org/
  14. American Libraries Association, http://www.ala.org/
  15. American Association of Museums, http://www.aam-us.org/
  16. MDA, http://www.mda.org.uk/
  17. Getty's "Introduction to Metadata", Crosswalks chapter, http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/
  18. NINCH's "Guide to Good Practice in Networking Cultural Heritage" http://www.ninch.org/PROJECTS/practice/

spacer gif

Richard Rinehart
Digital Media Director, Berkeley Art Museum/Pacific Film Archive and Art Practice Faculty
University of California, Berkeley
Berkeley, California, USA
rinehart@uclink.berkeley.edu