On Digital History: 2021

Friday, August 6, 2021

What is the Digital POWRR Project?

In my experience, a large majority of the grant-funded work that humanists, librarians and archivists produce online comes to exist as a project based at a single institution. The fact that a practitioner or practitioners at an individual organization usually administer the awarded funds tends to reduce inter-institutional resources to those of a grantee and subcontractors. The Digital POWRR Project certainly began that way at Northern Illinois University, but it is now starting to become something different.

POWRR started with a grant from the Institute of Museum and Library Services that produced a white paper study of digital preservation challenges and potential solutions for librarians and archivists, especially at smaller institutions lacking large financial resources. Two subsequent grants produced a number of in-person professional development events offering practitioners practical knowledge about how to improve their institution's level of digital preservation capacity.

This year, the project has entered a new phase: it is officially a multi-institutional entity. Stacey Erdman, a member of the original Digital POWRR team at Northern Illinois University who moved to work at Arizona State University, recently received notification that she had received funds from the Institute of Museum and Library Services' Laura Bush Twenty-First Century Librarian program for a new project advancing digital preservation practices at under-resourced institutions. Project partners include the Sustainable Heritage Network, the Black Metropolis Research Consortium, the Association of Hawai’i Archivists, Northwest Archivists, Inc., and Amigos Library Services. Stacey's project includes an innovative new component: training practitioners to provide peer assessments of other institutions' digital preservation capacities using the National Digital Stewardship Alliance's Levels of Digital Preservation.

Stacey remains a part of the POWRR team, helping to lead professional development events organized by Northern Illinois University, as well as other entities who invite POWRR to provide instruction.

Other POWRR instructors include Danielle Spalenka, who moved from Northern Illinois University Libraries to the Filson Club Historical Society; Lynne Thomas, who moved from Northern Illinois University Libraries to the University of Illinois at Urbana-Champaign; Martin Kong of Chicago State University (a partner organization in the study that produced the original POWRR white paper); Dorothea Salo, Faculty Associate at the University of Wisconsin iSchool; Sarah Cain, Curator of Rare Books and Manuscripts at Northern Illinois University Libraries; and Aaisha Haykal, who moved from Chicago State University to the College of Charleston (S.C.).

Since its inception, the Digital POWRR Project has drawn on the expertise of a number of practitioners situated at different institutions. Now it works through the sponsored project and grant administration organizations at multiple institutions as well.

I believe that this is possible because the Digital POWRR Project is largely organized around a common approach to digital preservation work, which its 2013 white paper originally articulated. At that time a number of large and/or wealthy institutions had made progress toward providing better preservation of digital materials in their collections, often by taking part in intensive (and expensive) professional development activities provided by well-regarded organizations in the emerging field. Realizing that these programs' large scope and cost often prevented many practitioners from benefiting from them, POWRR proposed a more flexible, "good-enough" approach to digital preservation. It emphasized making incremental progress toward better practices as measured by the NDSA's Levels of Digital Preservation., in part by introducing librarians and archivists to Open Source tools and encouraging them to assemble a set of applications best suited to their local workflow.

Although a perception that larger and wealthier organizations generally did a better job of preserving digital materials informed the original white paper, subsequent experience has shown that many representatives of R1 universities and well-funded private institutions remained quite confused about how to proceed toward digital preservation, and POWRR addressed their needs as well. More recently, a number of integrated, cloud-based digital preservation services have found an increasing foothold in the marketplace, and organizations that can afford their subscription fees have used them to good effect. This moves the POWRR Project back to its original emphasis on under-resourced organizations, as Stacey Erdman's recent award show.

In the future, the Digital POWRR Project looks forward to the opportunity to work through multiple organizations in order to provide this service. I would love to hear about other projects in the libraries and archives field, as well as digital humanities, that operate in a manner similar to that which I have described above.

Friday, April 9, 2021

ITHAKA Constellate: Text-Mining Product in Development

I have been invited to evaluate a beta version of ITHAKA' s text-mining product, which is tentatively titled Constellate. I'm thankful for the opportunity.

I have some knowledge of other text-mining products made available by library materials vendors like ProQuest and Gale. In my experience they work well, but they only offer the use of text materials found in those portions of individual vendors' available collections to which your particular institution has a subscription. If you want access to more materials for your data set, your institution needs to subscribe to more collections.

This type of product in general would be very helpful in teaching text data analysis at scale to non-programmers. I believe that humanities students can benefit from activities helping them to learn how to formulate hypotheses and evaluate evidence found in very large data sets. As individuals already receiving training in the critical evaluation of materials, they could make a valuable contribution to data-driven organizational activities in a number of fields. Put another way, employers of course need programmers able to build and adjust text-mining applications or sets of applications. But they also need critical thinkers to evaluate and results.

Access to a relatively limited number of text data sets is not a problem for this type of experiential learning, but it does present a large obstacle to original scholarly research. A paper making an argument based on the analysis of a data set that only contains those nineteenth-century text materials appearing in a ProQuest or Gale data set will very likely overlook a large part of the available historical record. Researchers need to be able to upload their own data sets into online text-mining services.

It is also my impression that the code and algorithms that do the data analysis for vendor-served text mining project remain proprietary, which means that researchers and collaborating programmers would be unable to download the code and customize it for their own use. Since in my experience effective text-mining often requires a great deal of adjustment and customization, this presents another problem.

Sales representatives for the above companies have made general statements about how their programmers were and are working on a function that would allow subscribers to upload their own data, but to my knowledge that has not happened. If any representatives of ProQuest, Gale, or other library vendors making similar products available have information to the contrary, please contact me and I will be happy evaluate your product.

I am very interested in Constellate because the ITHAKA representative with whom I spoke emphasized that their organization plans to present the service as A) able to analyze outside data sets, and B) willing to allow outside programmers to access its Python code for the purpose of customization. They hope to build a collection or set of open-source code applications that various Constellate users have constructed.

This would be a very promising situation for researchers, teachers and learners situated at R2 and smaller institutions lacking large financial resources.

I will spend the next few months working with Constellate and report on what I discover.

"Some Assembly Required: Low-Cost Digitization of Materials from Magnetic Tape Formats for Preservation and Access"

Earlier this year three colleagues and I published an article discussing the digitization of sound materials from magnetic tape formats.

Please find the abstract and a link to the journal below. It is my understanding that the individual article will be embargoed until March, 2022, so the link to the individual article itself probably will not work until then.

"Some Assembly Required: Low-Cost Digitization of Materials from Magnetic Tape Formats for Preservation and Access"

Preservation, Digital Technology, and Culture 49 (3) October, 2020, 89-98

https://www.degruyter.com/journal/key/PDTC/html

Sarah Cain, Brandon Welch, Annie Oelschlager, Drew VandeCreek February 10, 2021

Abstract

Recent work discussing the digitization and preservation of magnetic tape materials has maintained that it should be left to expert practitioners and that the resulting digital materials should be stored in digital repositories. This article suggests that librarians and archivists lacking extensive technical skills or access to expertise can digitize these materials themselves. It provides a detailed account, including challenges faced, of how a team of practitioners without prior training or experience digitized historical audio recordings on cassette and open reel tape at Northern Illinois University Libraries. The discussion reviews the assembly of equipment and software that the team used for digitization work, discussing each element’s significance and how they came together as a functioning workflow. The authors also emphasize the fact that while the digitization of fragile and/or degraded magnetic tape materials may contribute to the preservation of their contents, this action also creates a new set of materials with their own preservation needs. Realizing that many practitioners serving medium-sized and smaller institutions lacking large financial resources may not have access to a full-fledged digital repository, they suggest the use of the National Digital Stewardship Alliance’s Levels of Digital Preservation rubric as a means by which practitioners may incrementally increase the probability that digital materials made from magnetic tapes will remain accessible.