The following is a brief discussion of the findings of the Digital POWRR Project's recent white paper, bringing out themes that the white paper's length restriction obliged its authors to downplay.
The
information science and cultural heritage communities have developed a variety
of guidelines and standards that can enable an organization to achieve high
levels of digital preservation.[1]
Many practitioners have struggled to fashion the technical infrastructure
necessary to realize them, however. Information professionals serving
medium-sized and smaller institutions lacking large financial resources have
especially have found it difficult to study and comprehend the complex
challenges presented by the need to curate and preserve digital materials in a
programmatic manner. Unsure that they understand the issue well, many have
hesitated to address it.
Many
larger institutions have also not yet devised the means to curate and preserve
their digital materials in a programmatic manner, but in one instance a state
university system has given rise to a coherent and practicable way forward.
Representatives of the California Digital Library have addressed the University
of California’s extensive and diverse digital curation and preservation needs
by devising and coordinating the operation of a set of free-standing but
interoperable applications, each performing a single or limited number of tasks
in the larger curation and preservation process. They described their method as
a micro-services approach. While they conceived this arrangement with an eye
toward producing an infrastructure able to function in a large institution’s
wide variety of work environments in a context of rapid and pervasive
technological change, a micro-services approach can also help medium-sized and
smaller institutions to identify and meet a very different set of digital
curation and preservation goals.
A Micro-services
approach
In 2009 California Digital Library
staff members began development of a new approach to the curation and
preservation of digital objects produced by the ten-campus, $25 billion,
238,000 student, nearly 20,000 faculty member University of California system.[2]
Seeking to dispense with the assumption that the curation and preservation of
digital objects required the installation and operation of a single, long-lived
application combining the necessary functions behind one user interface, they
proposed to employ a set of twelve independent but compatible applications, or
micro-services, each responsible for performing a single function within the
digital curation and preservation process.
They described utilities devoted to data preservation as identity, storage,
fixity, replication, inventory, and characterization. Utilities devoted to data
curation included ingest, index, search, transformation, notification and
annotation. The developers of the micro-services approach to digital curation
and preservation made a number of compelling arguments for it. Small,
relatively simple utilities would pose fewer challenges in their development,
deployment, maintenance and enhancement than a large, integrated system,
especially in the context of constant technological change. In addition, users
could easily adapt a set of distributed services to local conditions in
different divisions and departments of the university, and easily replace each
of them upon their obsolescence.[3]
The
authors introducing a micro-services approach to the information science
community seemed to worry that their colleagues and users would look askance at
a technical arrangement that they viewed as somehow incomplete or piecemeal. At
the same time that they praised its simplicity and flexibility, the authors
emphasized that librarians and developers could put a set of interoperable
utilities together to provide a very large university community producing
digital materials in increasing numbers, size and type with the ability to
curate and preserve them in a fully comprehensive manner.[4] In
2010 the California Digital Library introduced Meritt, a new repository
developed using the micro-services approach.[5]
Available to the entire University of California Community today, Merritt
provides long-term preservation of digital materials and also enables users to
share data. It makes its functions available via a user interface and an
Application Programming Interface enabling machine to machine communication. As
of March, 2015 Meritt contained approximately 1.5 million digital objects, some
containing over 10,000 individual files or occupying over 100 GB of storage
capacity.[6]
The Challenge:
Digital Curation and Preservation at Medium-Sized and Smaller Institutions
In a
2011 report Merritt’s developers recommended a micro-services approach to
digital curation and preservation to other members of the Association for
Research Libraries, emphasizing how a strategic combination of individual
services could produce “the complex global function needed for effective curation”
at large institutions.[7] At
about the same time, the Digital POWRR (Preserving
Digital Resources with Restricted Resources) Project
(http://digitalpowrr.niu.edu/), a team of librarians, archivists and other
stake-holders from medium-sized and smaller Illinois institutions lacking large
financial resources, used financial support provided by an Institute of Museum
and Library Services National Leadership Grant (LG-05-11-0156-11) to begin a
study of the problems that the curation and preservation of digital objects
presented to their organizations, and how they might address them.
The POWRR institutions were, and
remain, members of CARLI, the Consortium of Academic and
Research Libraries in Illinois, but its leaders informed the study team that
while they recognized the digital curation and preservation challenges their
members faced, they also lacked the financial resources necessary to begin to
address them.
Early in the three-year
investigation period, POWRR team members prepared a case study of each of the
participating institutions, describing it in broad terms while paying
particular attention to the composition of its digital collections and the
unique challenges they presented; the details of its pertinent technical
infrastructure including content management systems and repository software in
use; and a self-assessment summary and review of current digital curation and
preservation activities, if any. At each of the five campuses, team members
identified digital content known to be vulnerable to loss, but realized that
they had not developed programmatic solutions to mitigate that risk. Upon
gathering and reviewing the relevant information, team members identified the
practices and policies that they believed could most help their institutions to
improve their curation and preservation of digital objects. They then performed
a gap analysis by identifying the obstacles that had prevented their
organizations from achieving the desired outcomes. Common factors impeding
digital preservation efforts emerged from the gap analyses, including a lack of
available financial resources; limited or nonexistent staff time dedicated to
digital preservation activities; and insufficient levels of appropriate
technical expertise.[8]
The
case studies allowed the research team to perceive a state of affairs more
complex than the raw data described, however. Lacking the time necessary to
stay abreast of frequent developments in the field of digital preservation, the
expertise or technical infrastructure necessary to install and maintain complex
software solutions, and/or the funds to pay for ready-to-use products that may
exist, librarians and archivists at POWRR institutions often felt overwhelmed.
Faced with what seemed to be an enormous undertaking, and hesitant to commit
scarce resources to the adoption of an integrated digital preservation utility,
many found themselves unable to take even the first steps toward curating and
preserving their digital materials.[9]
Micro-services and
Smaller Institutions
Soon
after beginning their investigation, members of the Digital POWRR team discovered
a basic misunderstanding thwarting their progress towards the development of
more effective digital curation and preservation programs. They had understood
digital curation and preservation as a black and white issue: either an
institution had implemented successful digital curation and preservation
measures or it had not. Behind this assumption lay another: that successful
digital preservation activities required the use of a single application
integrating a number of necessary tasks behind a single user interface. Their
research soon led them to realize that recent scholarship has emphasized that
practitioners should rather understand digital curation and preservation
activities as an ongoing, iterative set of actions, reactions, workflows, and
policies.[10]
Such an approach means that practitioners do not have to begin digital curation
activities by creating or selecting a comprehensive technical solution
appropriate for large sets of digital materials over a long period of time.
Instead, they can start by taking small first steps to triage digital
collections and identify means by which they might cumulatively enhance the
curation and preservation of those found to require immediate attention, while
working to bring the issue to the attention of colleagues and administrators
and, ultimately, advocate for resources necessary for the implementation of
additional capacity. Put another way, information professionals at institutions
lacking effective digital curation and preservation measures can benefit by
focusing their efforts on the activities by which they can curate and preserve digital content in the next six to
twenty-four months rather than waiting a decade to devise an ideal solution.
Professionals in the cultural heritage sector accustomed to thinking in terms
of centuries may find this to be a curious approach, but to wait is to risk the
loss of unique materials.
The National Digital Stewardship
Alliance’s (NDSA) Levels of Digital Preservation can provide practitioners
seeking to employ this approach with a yardstick by which they might measure
their progress toward improved digital curation and preservation capacity. The
NDSA describes the Levels as a work in progress, intended to provide a readily-usable
set of guidelines useful for professionals and/or institutions in various
situations, ranging from those just beginning to think
about how to curate and preserve their digital assets more effectively to those
planning the next steps in enhancing existing systems and workflows. The
guidelines speak to five functional areas that represent the core of digital
preservation systems: storage and geographic location, file fixity and data
integrity, information security, metadata, and file formats. The Levels do not
help to assess the efficacy of digital preservation programs as a whole since
they do not consider important matters such as policies, staffing, or
organizational support.[11]
In addition to providing information
professionals at smaller institutions with an understanding of their present
digital curation and preservation capacities, reference to the NDSA Levels of
Digital Preservation can help them to recognize discrete steps by which they
can improve their curation and preservation of digital objects. These may
include activities such as improving storage practices from reliance on stand-alone
media (e.g. CDs and portable hard drives) to the use of networked or
geographically distributed servers, which the Levels mark as moving a single
square to the right in one of their functional categories. Researchers have
made resources discussing fundamental activities like these freely available.[12]
Institutions
can begin to move closer to a goal of stabilizing and preserving digital
materials by taking a number of the small steps forward as described in the
Levels. They should not hesitate to take Level 1 actions that they could
readily achieve today while they devote their energies to deliberations
considering how to move from Level 2 to Level 3. Thinking of digital curation
and preservation activities as an ongoing activity, and the NDSA Levels of
Preservation as a measure of progress, resonates with two fundamental
understandings at the heart of a micro-services approach: first, that digital
curation and preservation is an uncertain process in which continuous, rapid
technological change often renders monolithic, integrated applications
cumbersome and outdated; and second, that simple tools focused on a specific
aspect or aspects of the process, often available at no charge, can prove more
helpful.
Practitioners can only make use of
individual micro-services tools if they understand which roles they play in the
larger digital curation and preservation process, and how they might contribute
to progress through the Levels. To this end, the Digital POWRR Project depicted
the several stages of a prospective workflow as a pathway to digital
preservation.[13]
See below.
POWRR’s
Overview of the Path to Digital Preservation
Micro-services tools typically perform only a
single or limited number of the tasks or functions that make up each stage of
this process. For example, functions
within the ingest portion of a digital preservation workflow employing
micro-services might include file copying; fixity checking; virus scanning;
file de-duplication; and unique identifier generation.[14] The figure below provides a list of micro-services functions making up the more general
pathway to digital preservation.
Functionality within POWRR's Path to Digital Preservation
An understanding of the discrete
tasks and functions that comprise each stage of a digital curation and
preservation workflow can provide practitioners at medium-sized and smaller
institutions lacking large financial resources with an opportunity to assess
how adding specific functions to their existing practices can benefit them.
Likewise, a reliable registry of available digital curation and preservation
tools, including those providing only a single or limited number of the
functions depicted in Figure 3, can help information professionals to determine
how they can perform those functions most readily in local circumstances. The COPTR (Community Owned Digital
Preservation Tool Registry) web site (coptr.digitpres.org) provides information
about hundreds of tools and services that address some aspect of digital
curation and preservation, ranging from single-function applications to more
robust and complex utilities. The costs of the tools/services noted in COPTR
vary from those that are freely-available via open-source communities to those
that are cost-prohibitive for many smaller institutions. Practitioners should
realize that open-source applications may require programming expertise.
COPTR’s description of an individual tool includes a brief descriptions of its
general usability and the technical expertise it requires, as well as a record
of recent development activity. The Digital POWRR Project study found that in
many cases active user groups supporting a particular tool already exist
online. Many tool developers also make themselves available to individuals and
groups wishing to learn to use their application. [15]
Understanding digital curation and
preservation activities as incremental, and using tools lending themselves to
this approach, information professionals at medium-sized and smaller
institutions lacking large financial resources can make the small first steps
needed to begin to make progress through the NDSA Levels of Digital
Preservation. For example, practitioners may accession and inventory a
collection of digital materials using a free, simple ingest tool called Data
Accessioner and a common spreadsheet application. To learn more about how to do
this, visit the POWRR website.
Meg Miner of Illinois Wesleyan
University, a member of the Digital POWRR Project team, described Data
Accessioner’s usefulness in performing the functions described above in Figure
3 as Auto Metadata Harvest and Package Metadata thus:
I use Data Accessioner (DA) to
capture technical metadata as I move files from transfer media to my as-yet
non-bit-level storage device. I use DA-MT (Data Accessioner Metadata
Transformer) to aggregate the file information from xml to something I can
understand: file types, quantities and sizes by type. I store the aggregate
information in my regular accession files (currently a spreadsheet). My
accession information and an Access copy are in a different hard drive from the
Master copy and XML. Someday I will move the accessions with content I think is
most at-risk (due to format or other unique attribute) into a bit-checking
storage environment…. this workflow costs me no money, no technical expertise
(beyond downloading Java and two processing files via ZIP) and very little extra
time. With DA, I am capturing all the recommended technical information for use
by a back-end preservation system. With DA-MT I can track growth rate of
digital content overall, make a case for purchasing better storage, and keep an
eye on where all the at-risk file types are in the interim.[16]
In this case, use of
a single, open-source tool requiring no programming skills or other technical know-how
has helped a single archivist, working alone, to gather important information
about her institution’s digital collections which, as she notes, will prove
indispensable in the longer-term construction of a larger digital curation and
preservation system. The discrete, specialized tools that characterize a
micro-services approach also can ultimately help this practitioner and others
like her to build such a system, adding new functions and capacity to an
emerging digital curation and preservation workflow customized to address local
needs and idiosyncrasies, including the particular characteristics of
individual collections.
Micro-services
tools can also help practitioners adopting more robust tools. They should not
assume that an application bundling many digital curation and preservation
functions together with a single user interface will necessarily provide an
entirely comprehensive and worry-free experience. In many cases developers of
more extensive applications have assumed that their users have already
performed NDSA Level 1 triage activities (e.g. moving data from disparate
storage media to more secure locations and performing minimal inventory and/or
simple metadata creation). For example, POWRR researchers found that Curator’s
Workbench provided mechanisms for importing MODS records that its developers
assumed already existed, but not an intuitive interface for creating MODS
records from scratch. This presents information professionals at many
institutions with a significant gap between their digital materials’ present
condition and that state necessary even to begin curation and preservation
activities with their new utility. While individual micro-services tools can
enable practitioners to build a local digital curation and preservation system
in an incremental fashion, some of the most basic tools can also prepare
digital objects for ingestion into and storage within more elaborate
applications.
Research
libraries and archives presently face the difficult prospect of devising technical
means by which they may curate and preserve the large volume of digital objects
that their stakeholders and staff members have created, and will create.
Guidelines and standards describing effective practices presently exist, but
many information professionals have encountered great difficulty in providing
the infrastructure necessary to follow them and provide enhanced levels of curation
and preservation. The California Digital Library (CDL) has described and
implemented what it describes as a micro-services approach to the problem.
Dismissing the familiar means by which information professionals have addressed
other large challenges in their field, that of selecting and installing a
large, integrated application, CDL staff members have advocated the development
and strategic deployment of independent but interoperable utilities devoted to
performing specific, finite functions contributing to the larger curation and
preservation process. They have argued that administrators and developers
responsible for digital curation and preservation functions may devise, deploy,
maintain, upgrade, adapt and replace discrete, simple utilities far more easily
than a single application combining many functions behind a single user
interface. In addition, a distributed system allows units of a large university
system, varying widely in their practice, to adapt it to their use. Functioning
together, twelve micro-services utilities have provided the university’s ten
campuses with a comprehensive digital curation and preservation infrastructure
known as Merritt.
A
review of digital curation and preservation challenges facing smaller
institutions lacking large financial resources suggests that the micro-services
approach can benefit them as well. The same benefits that recommended it to the
University of California apply to smaller institutions. Information
professionals serving them may deploy, maintain, upgrade, adapt and replace
simple applications performing specific functions more readily than they might
install, manage, improve and, in a worst case, abandon and replace, a
comprehensive digital curation and preservation system. Due to its modular
nature and the amount of open-source applications available for use in a
micro-services arrangement, units within these institutions stand a better
chance of adapting one to their needs than an integrated, more comprehensive
system. Information professionals employed at medium-sized and smaller
institutions lacking large financial resources can also benefit from a micro-services
approach to digital curation and preservation for another reason. Many have often
reached the same misconception that the designers of the micro-services
approach originally noted: that an all-or-nothing approach, taking years to
select and implement, represented the only approach to digital curation and
preservation services. Believing that they lacked knowledge of the available
applications, information professionals at smaller institutions most often declined
to seek the significant financial resources necessary to make a selection, and
took no other constructive steps toward enhancing their digital curation and
preservation practices. A micro-services approach provides them with an
opportunity to break this pattern of inaction and begin curation and
preservation activities.
In
this context the Digital POWRR Project study encouraged information
professionals at medium-sized and smaller institutions to take basic, simple measures
mitigating their materials’ risk of loss, no matter how limited their scope and
effectiveness might seem. Measuring their progress by reference to the National
Digital Stewardship Alliance’s Levels of Digital Preservation, smaller
institutions might begin to improve their curation and preservation of digital
collections through a series of small steps. The use of individual tools
performing discrete functions can allow practitioners beginning curation and
preservation activities where none have previously existed to perform initial triage
on their digital collections and improve their practice over time, in part by expanding
and customizing their technical infrastructure incrementally, building a set of
applications best suited to local needs. Tools performing an individual or
limited number of specific functions contributing to enhanced levels of digital
curation and preservation can prove especially useful in this context. Aware
that many practitioners that might benefit from these tools have little or no
awareness of their existence or functions, the Digital POWRR Project made
information describing the several stages of a digital curation and preservation
workflow, the discrete activities making up each, and tools performing these functions
freely available online via the COPTR web site.
Notes
[1] See, for example, the Digital
Curation Centre’s Curation Lifecycle
Model, which features checklists for practitioners, at www.dcc.ac.uk/resources/curation-lifecycle-model.
Also see Reference Model for an
Open Archival Information System (OAIS): Recommended Practice CCSDC 650.0-M-2
(2012) at http://public.ccsds.org /publications/archive/650x0m2.pdf and Space Data and Information Transfer Systems – Audit and Certification
of Trustworthy Digital Repositories (ISO 16363:2012), accessed July 9,
2015, www.iso.org/obp/ui /#iso:std:iso:16363:ed-1:v1:en.
[2] “The University of California at
a Glance,” The University of California, 2015, accessed July 9, 2015, www.universityofcalifornia.edu/sites/default/files/uc_at_a_glance_011615.pdf.
[3] Stephen Abrams, Patricia Cruze, and
John Kunze, “Preservation is Not a Place,” International
Journal of Digital Curation 1, no. 4 (2009):8-21; Stephen Abrams, John
Kunze, and David Loy, “An Emergent Micro-Services Approach to Digital Curation
Infrastructure,” International Journal of
Digital Curation 1, no. 5 (2010): 173-174.
[4] Abrams, Kunze, Loy, “An Emergent
Micro-services Approach,” 184.
[5] www.cdlib.org/cdlinfo/2010/09/16/deposit-save-share-find-that-content-and-data-new-uc3-services-launch/, accessed May 6, 2015.
[6] https://merritt.cdlib.org/docs/merritt_handout.pdf; Meritt Service Update: March
2015, accessed May 6, 2015, www.cdlib.org/cdlinfo/2015/04/24/merritt-service-update-march-2015/.
[7] Tyler Walters and Katherine
Skinner. 2011. New Roles for New Times: Digital Curation for Preservation. (Washington,
DC: Association for Research Libraries, 2011), 53.
[8] Jaime Schumacher The Digital POWRR Project: A Final Report to
the Institute of Museum and Library Services, accessed July 9, 2015, http://hdl.handle.net/10843/13678.
[9] A. K. Rinehart,
P-A. Prud’homme, and A. R. Huot, “Overwhelmed to Action: Digital Preservation Challenges at the Under-RIsourced
institution,” OCLC Systems &
Services, 30, no. 1 (2014):28-42. www.emeraldinsight.com /journals.htm?issn=1065-075x&volume=30&issue=1&articleid=17106334&show=html;
M. Proffitt, “Something’s
Got to Give: What Can We Stop Doing in a
Time of Reduced Resources?” RBM: A Journal of Rare Books, Manuscripts, &
Cultural Heritage, Fall 2011, no. 12: 89-91, accessed July 9, 2015, http://rbm.acrl.org/content /12/2/89.full.pdf+html
[10] Gordan J. Daines,
III, “Module 2: Processing Digital Records and Manuscripts,” Archival Arrangement
and Description, ed. Christopher J. Prom
(Chicago: Society of American Archivists, 2013), 87-143.
[11] National Digital Stewardship
Alliance. The NDSA Levels of Digital Preservation, accessed July 9, 2015, www.digitalpreservation.gov/ndsa/activities/levels.html.
[12] See, for example, Julianna
Barrera-Gomez and Ricky Erway Walk This
Way: Detailed Steps for Transferring Born-Digital Content from Media You Can
Read In-house (Dublin, OH: OCLC Research, 2013 ) www.oclc.org/content/dam/research/publications/library/2013/2013-02.pdf.)
Also see the DP 101 page on the Digital POWRR
Project website.
[13] Jaime Schumacher, Lynne Thomas and Drew VandeCreek From Theory to Action: Good Enough Digital
Preservation Solutions for Under-Resourced Cultural Heritage Institutions,
(Washington, DC: Institute for Museum and Library Services, 2014), 6. http://commons.lib.niu.edu/bitstream/10843/13610/1 /FromTheoryToAction_POWRR_WhitePaper.pdf
[14] Schumacher, Thomas and VandeCreek, From Theory to Action, 8-13.
[15] Ibid., 11.
[16] Meg Miner, “Preservation
Processing Update.” Digital POWRR Project (blog), October 25, 2014, http://digitalpowrr.niu.edu/processing-update/