Produce, Publish and Preserve: A Holistic Approach to Digital Assets Management
spacer gif
By Guenter Waibel, Digital Media Developer, UC Berkeley Art Museum & Pacific Film Archive
spacer gif

Traditionally, museums produce and manage surrogate representations of their artifacts in the form of photography. Transparency film surrogates (4x5) have been established as the de-facto standard, with applications ranging from in-house documentation to public access via print publication. With the rise of digital imaging, the creation of digital resources initially built on the established workflow of creating photographic surrogates. Museums predominantly produced digital images as derivatives of pre-existing photography, a mere afterthought to analog capture enabled by cheap flatbed scanning technology.

Digital imaging has reached a watershed point. Present day digital camerabacks boast a resolution that effectively enables them to cut out the analog middleman. At the Berkeley Art Museum, a study by photographer Ben Blackwell concluded that files created at the maximum resolution (6000 x 8000 pixels) of a BetterLight Super 6K cameraback would yield images exceeding the information stored in a 4x5 transparency. The circle closes - digital images now can be used to print 4x5 transparencies, quite manifesting their ability to replace the analog medium.

At high enough resolution, it seems digital images can do everything film images can, and then some. Master image files can be re-purposed for print and web-output, serving both promotional needs and educational goals. Museums can contribute images to networked access portals like CDL, AMICO or RLG, creating unparalleled public access to collections. Virtual access to collections may even become part of an institutions preservation strategy, since researchers can conduct most of their investigations with the aid of surrogate digital representations rather than with the original artworks.

However, the increased functionality gained in the transition from analog to digital comes at the cost of new technological challenges. The promotion of digital images from disposable, short-lived teasers on a web site to fully fledged, mission critical institutional assets requires distinct strategies in the production, publication, and preservation of the new resource. In order to economically take advantage of the possibilities mentioned at above, digital assets need to be managed in a manner resembling the way museums manage their collections of original art or documentation. Since Collections Management Systems control collection organization, it seems appropriate to suggest a database providing the same services to the rapidly increasing collections of digital surrogates. A digital assets management database supports the production, publication and preservation of digital files. It tracks a surrogate from its creation as a digital master file through its presentation as an application specific derivative to the eventual data migration and refreshing required by changing file formats, storage media or computer platforms.

At the Berkeley Art Museum & Pacific Film Archive, a database with the functionality outlined above has been under development since the beginning of the year 2000. The museum’s growing involvement in networked access projects such as MOAC (Museums and the Online Archive of California) and CIAO (Conceptual and Intermedia Arts Online) presented us with the challenge of having to produce (1) digital images in compliance with project specifications, and (2) the exchange file formats the project had agreed upon. The exchange formats are EAD (Encoded Archival Description) mark-up for navigating collections, and MOA2 (Making of America 2) mark-up for exploring objects within a collection, i.e. the various images comprising an artist’s book. In the final implementation, EAD and MOA2 build on each other: the EAD outlines the hierarchical structure of collections and supports discovery down to the item-level, from where it links to a MOA2 Object. MOA2 outlines the hierarchical structure of the item itself, presenting a navigational recreation of complex objects such as diaries or books, which consist of various different subobjects such as individual pages.

For the EAD mark-up, we already had most of the required descriptive metadata ready for export in our Collections Management System. The capture of the extensive metadata set of MOA2 objects, however, required a new data-architecture. Two distinct sets of metadata comprise a MOA2 Object. Structural Metadata facilitates the navigation and presentation of a digital object. In essence, it tells the viewing application what relationships exist between different parts of an object, and how the object should be assembled for the enduser. Administrative Metadata facilitates the management and regulates the use of digital assets. It breaks down into technical metadata (detailing the specifications of the digital files), intellectual property rights information and metadata on the source for the digitization. All things told, a minimal MOA2 object consists of 20 required metadata elements.

To visualize the interplay between EAD and MOA2 mark-up, let's consider the following examples. The first block of code (Example 1) comes from a (fictional) EAD finding-aid and represents the item-level of a container list. The <daoloc> (Digital Archival Object Location) tags specify the URL for both the in-line thumbnail representation (role="thumbnail") of the object, and the destination of the link from the thumbnail (role="hi-res"). However, instead of specifying a high resolution access file, the link actually calls on the objectviewer for the MOA2 xml Object.
spacer gif

Example 1: EAD Container List, Item Level
spacer gif
<c03 id="bampfa" level="item">

<daoloc href="
archobj?DOCCHOICE=1992.4.91.moa2.xml" role="hi-res"></daoloc>
<daoloc href="
bampfa_1992.4.91_136_3.jpg" role="thumbnail"></daoloc>


<origination><persname>Theresa Hak Kyung Cha<lb></persname></origination>
<unittitle>White Dust from Mongolia<lb></unittitle>
<repository>The Berkeley Art Museum / Pacific Film Archive<lb></repository>


spacer gif

The MOA2 mark-up for one object (presented with the aid of a Java Servlet) consists of three distinct sections: a file group (<FileGrp>; Example 2), a structural map (<StructMap>; Example 3) and a section on administrative metadata (<AdminMD>; Example 4). The structural map contains the blueprint for the object's navigation, and calls on the file group to resolve the actual location of the individual image or text files. For each file (or for each specific version of the file, i.e. all files tagged "reference"), the administrative metadata grouping holds metadata on source, file-format, resolution etc. The following examples of a (fictional) MOA2 object illustrate how the sections interrelate to establish the complete object.
spacer gif

Example 2: File Group
spacer gif

<FileGrp VERSDATE='1998-09-01' ADMID='HIREZJPG'>
<File ID='bampfa_1992.4.91_136_3' MIMETYPE='image/jpg' SEQUENCE='1' CREATED='1998-09-01'
<FLocat LOCTYPE='URL'></FLocat>

[repeat same structure for other versions of the same masterfile, e.g. ADMID='LOREZJPG'>]

Note that the mark-up has as many <FileGrp>s as there are versions, for example three filegroups for three representations of the same object by high resolution jpgs, low resolution jpgs and thumbnail gifs.
spacer gif

Example 3: Structural Map
spacer gif

<div N='1' TYPE='paper' LABEL='page 1'>
<fptr FILEID='bampfa_1992.4.91_136_3' MIMETYPE='image/jpeg' />
<fptr FILEID='bampfa_1992.4.91_136_4' MIMETYPE='image/jpeg' />

spacer gif

This fictional object, for brevity's sake, consists of only one image file; a complex object repeats the mark-up sections outlined above, and represents the hierarchy in the structural map through a nesting of <div> tags. The N attribute on the <div> counts of the subobjects on the current level of the hierarchy.
spacer gif

Example 4: Administrative Metadata
spacer gif
           <Compression>JPEG</Compression> <Dimensions X='1536' Y='1024' />
<BitDepth BITS='16' />
<Resolution>600 PPI</Resolution>

spacer gif

[repeat same structure for 'LOREZJPG' or any other versions, including transcriptions.]
spacer gif

In order to capture all the data for the file exchange formats, we envisioned a generic database that would track all archival digital assets of BAM/PFA and allow push-button export of both EAD and MOA2 xml mark-up. The database would harvest descriptive metadata from our existing Collection Management System, but could also accommodate digital objects without representation in our accessioned collection, such as digital video interviews with artists in residence. Fortunately, a precedent to our endeavor existed on the Berkeley campus in the form of a Microsoft Access database used by the Bancroft Library to manage digital projects, which kept us from having to entirely re-invent the wheel. The main difference between the BAM/PFA database and the existing library solution: we wanted to develop a platform-independent database we could give out to all project partners regardless of their denomination (Mac or PC), and we wanted a database which could manage all our digital assets beyond the boundaries of specific projects. Anticipating a rough ride to the realization of our vision, we named the FilemakerPro database under development DAMD, or Digital Assets Management Database. A beta-release of DAMD has been available to MOAC project partners since May 2000. (See Appendix A)

Functioning as a production tool, the database establishes communication between the project administrator and the digital imaging studio. The administrator describes the hierarchy of a collection and the physical structure of the objects (i.e. a sketchbook with 8 pages) in the collection, details the specifications for the master image and its derivatives, and requests transcription, if appropriate. In effect, the administrator creates a work-order while at the same time providing the bulk of the metadata associated with the object. From here on, the production chain may be split into separate sub-tasks. For example, a photographer produces the master image; an intern may produce the derivatives and the transcription, as well as back up the files on a server and on the specified permanent storage medium (i.e. CD ROM). Password access reduces the risk of accidental data corruption; each member of the team has read-only access to the database as a whole, but can write to the layouts specific to their task. A system of sign-offs on the specific tasks keeps the administrator informed on the progress, and helps co-ordinate the efforts among the team.

Once digitization for a collection has been completed, the administrator exports the metadata into the appropriate mark-up format. If the database gathers sufficient metadata, export-routines to different codes may enable the contribution of the same collection to various networked access projects (CDL, RLG, AMICO) in their respective native file exchange formats. So far, DAMD exports into EAD finding aids and MOA2 objects according to MOAC specifications. Push-button export into the EAD obliterates the tedious, time-consuming and error-prone cut-and-paste patchwork of the past. The FilemakerPro database gathers and orders the different levels of a collection, populates them with the appropriate objects at the item-level, and, with the aid of a free Filemaker plug-in, writes the end result to a text file.

A further benefit of managing digital assets in a database with web-publishing capabilities such as Filemaker consists in the option of turning the database itself into an access tool. While the creation of standardized exchange formats such as the EAD will continue to be a pre-requisite for participating in networked access projects, publishing such documents to the web requires sophisticated xml search-engine capabilities currently beyond the reach of most smaller institutions. In other words: it may be beyond your institution to provide access to collections via the exchange format demanded by consortias, but you may still contribute and then deploy your images independently by using the database itself as a local back-end.

While the production and publication of digital assets presents the community with interesting issues in the short run, another formidable challenge consists in learning how to properly care for digital files. Under less than ideal storage conditions, a 4x5 transparency has a lifespan of about 5-10 years, and since its introduction in the late 1930s, archivists all over the world have learned how to preserve film, extending its lifespan considerably. In the digital arena, such comforting knowledge does not exist as of yet. Since the lifespan of a digital file depends exclusively on the unpredictable, yet inevitable, evolution of the technical environment, clear-cut assumptions about its lifespan can hardly be made. The only certainty in a digital image's long-term future consists in its need for refreshing (moving the file to a different physical storage medium) and / or migration (moving the file to a different encoding format). However, the California Digital Library assumes that by following their guidelines an institution can conceivably "create a master that has a useful life of at least fifty years." If that claim holds true, digital files boast increased functionality while retaining a similar range of longevity in comparison to transparency film.

As Howard Besser states, technical metadata as captured in a database for digital assets management provides "the first line of defense" against losing access to files, and the financial investment they represent. The database tells you about the specific circumstances of a file’s creation, for example the hardware and software environment used or the compression scheme applied, and its current storage location. Moreover, it ensures consistency across a large number of files by enforcing in-house standards for creating digital media. In this way, the database facilitates the maintenance crucial to the life span of digital assets by identifying the files in need of refreshing or migration and by enabling a batch-processing approach to any impending data movement or conversion.

The metadata fields in DAMD derive primarily from two sources. For descriptive metadata, the database utilizes REACH elements as a data structure, which MOAC partners have agreed to express in EAD mark-up for final output. As far as administrative and structural metadata goes, the database implements the necessary fields for MOA2 mark-up as a data structure. Since the self-proclaimed goal of the CDL’s "Digital Image Collection Standards" consists in creating a resource that will last half a century, the technical metadata (as part of the administrative metadata) and the guidelines for digital capture provide a strict standard for both the creation of digital images and their documentation. Apart from technical metadata, the set of elements referred to as "administrative metadata" includes information on rights and the physical source of the digital file. A set of elements comprising the structural metadata describes all the information necessary to view and navigate the different versions of a surrogate digital object. In the case of an artist’s book, structural metadata tells the MOA2 objectviewer how to create a navigational table of contents, and which page to display next.

To produce fully functional EAD and MOA2 mark-up, the database has to capture an impressive 35 fields per object, and an additional 20 fields for each subobject (like the individual pages of an artist’s book). For all those shaking their heads at the prospect of such massive data entry, let me hasten to add: if the descriptive metadata can be imported from an existing CMS, only a maximum of four fields have to be genuinely hand-entered per object, with one additional element for each subobject iteration. Another eight elements almost pick themselves from pull-down menus (no additional element for subobject iteration), and the remaining elements either stay static enough to qualify as global fields, or the database itself works the appropriate magic behind the scenes.

Let’s take a closer look at some of the key layouts in the database we developed at the Berkeley Art Museum. At first run, the database prompts the Administrator to set up global defaults concerning baseline information about the institution, on and off-line file storage and copyright statements. As I have pointed out before, the defaults reduce the amount of data-entry; however, all of them may be overwritten on a case-by-case basis during object set-up.
spacer gif

spacer gif

The first regular layout of the database gathers descriptive metadata. In the case of the Berkeley Art Museum, the administrator simply types in the accession number for the object to be digitized, and the collection management system, another FilemakerPro database, instantly populates the appropriate fields in DAMD. Furthermore, the ObjectInfo layout prompts the administrator to place the item within the hierarchy of a collection, or, to speak EAD for a moment, to define its exact destination in the container list. Collections and their levels (for example, Collection: Conceptual Art; Series: Artist's Books) evolve almost as a by-product of setting up objects. If the appropriate collection or level within a collection hierarchy does not yet exist for an object, a quick visit to the Collections layout facilitates the definition of new levels or an entirely new collection. The new level or collection will be available from pull-down menus within ObjectInfo for all further items, saving the trip to the Collections area the next time around.

In the next layout, the administrator defines the versions for all surrogates from this object. A typical workorder will include a high-end uncompressed archival file created by direct capture, two compressed derivatives for online access (reference and thumbnail), and a transcript. The order also specifies which hardware and software should be used in creating the files, essentially a request for a specific in-house workstation. File-size for all versions is defined by pixels per longest dimension, a strategy which ensures that the amount of data captured per item always stays within the same range. If file-size were defined by resolution, a scan of a very small object would yield a significantly smaller file than a scan of a rather large object. For example, a 600 ppi scan of a 1x1 object (stamp) produces a file of about 1MB, while a 600ppi scan of a 4x5 object (transparency) produces a file of about 21MB. Defined by longest dimension, capture of a small object yields a very similar amount of data as capture of a very large object, deviating only by the variance in height-width proportion, not by the difference in absolute size. The file size for the stamp and the transparency captured at 3000 pixels longest dimension would amount to about 26MB and 21MB respectively.

Skipping both the Rights and the Access layout, which usually only require a review of the pre-defined defaults, the next crucial layout allows the administrator to describe the individual parts constituting an object. In "Describe Structure," a tree-like hierarchy defines the parts of an object and their relationship to each other as parents and/or children. The break-down of the object follows both physical and intellectual entities. An artist’s book, for example, breaks down into the physical entity of pages. Each individual page yields a digital file. However, the artist’s book might also be organized into various chapters, which do not yield files, as they are intellectual concepts grouping a number of physical entities together. Since the intellectual structure provides an indispensable navigational function, it needs to be captured along with the physical entities. In the database itself, both the physical and intellectual entities define the object’s structure within the same hierarchy, to be viewed in its entirety in the "Sequence" layout. The differentiation between a physical entity and an intellectual entity only requires a mouse-click: for all physical entities, the administrator requests imaging and transcription if appropriate, while intellectual concepts have no image equivalent.
spacer gif

spacer gif

As already pointed out above, the "Sequence" layout displays the fruit of our labor: a hierarchical structure of the object and its subobjects as a table of contents, differentiating "children" from their "parents" by indentation. At this point, the database also analyses the object and classifies it as "simple" (consisting of only one surrogate, like the capture of a traditional oil painting) or "complex" (consisting of more than one surrogate, like the multiple files making up an artist's book). Only complex objects have a need for the navigational functionality of MOA2, and accordingly, the database sets the EAD entity resolving the access URL for the object to either a MOA2-link for complex objects, or an image link for simple objects. Once the Administrator has verified the correct set-up for an object, they trigger a script generating entries into a relational database. So far, we have only told the database that we eventually wish to have a specific set of versions for each subobject. The script publishes the request to the imaging and transcription database, creating entries of the individual version for each subobject. The set-up for the object has been completed.
spacer gif

spacer gif

The layout "All Objects Overview" now informs the photographer that digitization for this object has been requested, and from within "SubObjects Overview," the photographer can navigate the individual settings for each subobject and its versions. The "Visuals" layout informs the photographer of the exact nature of the request. The strategy of setting up the data before the actual digitization virtually eliminates data entry during capture - the photographer has to name the created file according to the filename generated by the database, and, at the most, type in the size of the shorter dimension. However, in our specific case, we hope to import the shorter dimension plus a host of other technical metadata directly from the capture device. BetterLight camerabacks allow export of metadata for each capture in a tab-delimited text file, which in turn can be imported into DAMD using the filename as a key. Once the master file has been created, the photographer’s assistant may produce the requested derivatives by re-sizing and compression, transcribe the object if necessary, upload the files to the server location specified in the layout "Storage," and finally, create a hardmedia back-up.

The database now knows everything it ever wanted to know about the object and its place within a collection. After the entire collection has been populated with the appropriate objects, the remaining feat consists in having DAMD regurgitate the information in the structured fashion of an EAD finding-aid and MOA2 mark-up. On the EAD export screen, the administrator picks a collection from a pull-down menu, chooses a filename and (optionally) provides a preferred citation (<prefercite>) and a general copyright note (<userestrict>) for the frontmatter of the finding aid. The export script does the rest. It works its way down the hierarchy of collection-levels, populating them with the associated objects. Once the script has identified the record for a collection-level or item, it picks up the mark-up for the section as a complete parcel from a calculation field within the record, and adds the chunk of code to an external text file. (See Appendix B)
spacer gif

spacer gif

In this way, the database assembles a complete finding-aid parsing at 100% EAD compliance ready for contribution to a union database without anybody even having to look at a single tag. Using a database such as the one I have described above, institutions with very little or no in-house expertise in the EAD could still contribute to EAD-based consortias. DAMD also exports the MOA2 objects at the push of a button, and soon will also feature that functionality for associated textuals. Since all the data resides within the same set of relational FilemakerPro databases, a single export -routine per collection could conceivably create the EAD finding-aid, the individual files of MOA2 mark-up for each object called on by the finding-aid, and the associated Text Encoding Initiative (TEI) transcription for each object.

While DAMD has not reached full maturity yet, the project already demonstrates the power of a database solution for digital asset management. Various strategies, such as importing descriptive metadata from the CMS, importing technical metadata from the capture device, and the extensive use of global defaults and pull-down menus keep the actual number of fields for data entry at a very reasonable level. Furthermore, the database can compress the production and the publishing of digital resources economically into one single effort of data entry. Establishing communication between the various parties involved in production, and keeping track of progress, the database not only manages data, but also facilitates the management of the entire workflow. Once the metadata and the images have been captured, any number of export routines may provide access to the resource in any desired mark-up format. The database itself may be used as a back-end for web-access. And last but not least, the thorough documentation of your digital assets moves them into the future - it ensures that you will not have to start over again next year.


Back  |  Appendix A  |  Appendix B