On Digital History

Tuesday, November 28, 2023

Great lengths: a review of website preservation activities at three American Universities with digital humanities centers

I recently published an article discussing the preservation or sustainability of legacy digital humanities websites. These are resources that were created with grant funds which remain online after the end of a grant period.

The abstract reads as follows:

Sustaining grant funded digital humanities websites has become a major challenge in the field. Three American universities with digital humanities centers kept eight of nine websites funded by the United States National Endowment for the Humanities (1996–2003) online to 2022. Center personnel made website preservation a part of everyday operations without additional funds devoted to the task. Web software developed rapidly in this period, however and center staff members’ efforts often did not succeed in providing necessary updates. Funded materials became increasingly obsolete. The extent of center personnel’s efforts, compared with their results, suggests that their approach itself will in many cases prove unsustainable. In one case, a university shifted responsibility for a popular website to its library. The library completely rebuilt it, only to find that the resource had again become obsolete less than 10 years later. Reconstruction should therefore be understood as an ongoing process, and its cost and complexity suggest that many online resources will not benefit from it. A new approach converting websites to a static state can facilitate sustainability at lower cost, but it also requires resources for implementation. Two American funding agencies have recently made grants available for website preservation and reconstruction. Similar organizations in other parts of the world have not followed suit and should consider doing so. In the absence of a comprehensive effort to identify and evaluate legacy websites for preservation, the competitive process of securing grant awards can begin to determine which legacy websites will survive.

The full article appears in Digital Scholarship in the Humanities and is available at https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqad080/7416357?utm_source=advanceaccess&utm_campaign=dsh&utm_medium=email

Tuesday, February 1, 2022

New Digital POWRR Project grant

The Digital POWRR Project, of which I am co-director, has received a new grant from the National Endowment for the Humanities. Totaling $348,900, it will support the planning and presentation of five in-person professional development events, each lasting 2.5 days. These events will provide those in attendance with training in the preservation of digital materials held in library, archive, and museum collections. The program will especially seek to recruit practitioners responsible for the collection and preservation of materials pertaining to the Native American, African American, and Latinx communities. In order to facilitate participating practitioners' participation, the grant will make funds available for the reimbursement of their travel and accommodations.

Two project events will take place in suburban Chicago at Northern Illinois University's Naperville campus; two will also take place at Arizona State University in Tempe, Arizona. One will take place at Oklahoma State University in Stillwater, Oklahoma.

Please see the Northern Illinois University press release for additional information about this program:

https://niutoday.info/2022/01/20/neh-grant-to-niu-will-help-preserve-history-at-institutions-nationwide/

Friday, August 6, 2021

What is the Digital POWRR Project?

In my experience, a large majority of the grant-funded work that humanists, librarians and archivists produce online comes to exist as a project based at a single institution. The fact that a practitioner or practitioners at an individual organization usually administer the awarded funds tends to reduce inter-institutional resources to those of a grantee and subcontractors. The Digital POWRR Project certainly began that way at Northern Illinois University, but it is now starting to become something different.

POWRR started with a grant from the Institute of Museum and Library Services that produced a white paper study of digital preservation challenges and potential solutions for librarians and archivists, especially at smaller institutions lacking large financial resources. Two subsequent grants produced a number of in-person professional development events offering practitioners practical knowledge about how to improve their institution's level of digital preservation capacity.

This year, the project has entered a new phase: it is officially a multi-institutional entity. Stacey Erdman, a member of the original Digital POWRR team at Northern Illinois University who moved to work at Arizona State University, recently received notification that she had received funds from the Institute of Museum and Library Services' Laura Bush Twenty-First Century Librarian program for a new project advancing digital preservation practices at under-resourced institutions. Project partners include the Sustainable Heritage Network, the Black Metropolis Research Consortium, the Association of Hawai’i Archivists, Northwest Archivists, Inc., and Amigos Library Services. Stacey's project includes an innovative new component: training practitioners to provide peer assessments of other institutions' digital preservation capacities using the National Digital Stewardship Alliance's Levels of Digital Preservation.

Stacey remains a part of the POWRR team, helping to lead professional development events organized by Northern Illinois University, as well as other entities who invite POWRR to provide instruction.

Other POWRR instructors include Danielle Spalenka, who moved from Northern Illinois University Libraries to the Filson Club Historical Society; Lynne Thomas, who moved from Northern Illinois University Libraries to the University of Illinois at Urbana-Champaign; Martin Kong of Chicago State University (a partner organization in the study that produced the original POWRR white paper); Dorothea Salo, Faculty Associate at the University of Wisconsin iSchool; Sarah Cain, Curator of Rare Books and Manuscripts at Northern Illinois University Libraries; and Aaisha Haykal, who moved from Chicago State University to the College of Charleston (S.C.).

Since its inception, the Digital POWRR Project has drawn on the expertise of a number of practitioners situated at different institutions. Now it works through the sponsored project and grant administration organizations at multiple institutions as well.

I believe that this is possible because the Digital POWRR Project is largely organized around a common approach to digital preservation work, which its 2013 white paper originally articulated. At that time a number of large and/or wealthy institutions had made progress toward providing better preservation of digital materials in their collections, often by taking part in intensive (and expensive) professional development activities provided by well-regarded organizations in the emerging field. Realizing that these programs' large scope and cost often prevented many practitioners from benefiting from them, POWRR proposed a more flexible, "good-enough" approach to digital preservation. It emphasized making incremental progress toward better practices as measured by the NDSA's Levels of Digital Preservation., in part by introducing librarians and archivists to Open Source tools and encouraging them to assemble a set of applications best suited to their local workflow.

Although a perception that larger and wealthier organizations generally did a better job of preserving digital materials informed the original white paper, subsequent experience has shown that many representatives of R1 universities and well-funded private institutions remained quite confused about how to proceed toward digital preservation, and POWRR addressed their needs as well. More recently, a number of integrated, cloud-based digital preservation services have found an increasing foothold in the marketplace, and organizations that can afford their subscription fees have used them to good effect. This moves the POWRR Project back to its original emphasis on under-resourced organizations, as Stacey Erdman's recent award show.

In the future, the Digital POWRR Project looks forward to the opportunity to work through multiple organizations in order to provide this service. I would love to hear about other projects in the libraries and archives field, as well as digital humanities, that operate in a manner similar to that which I have described above.

Friday, April 9, 2021

ITHAKA Constellate: Text-Mining Product in Development

I have been invited to evaluate a beta version of ITHAKA' s text-mining product, which is tentatively titled Constellate. I'm thankful for the opportunity.

I have some knowledge of other text-mining products made available by library materials vendors like ProQuest and Gale. In my experience they work well, but they only offer the use of text materials found in those portions of individual vendors' available collections to which your particular institution has a subscription. If you want access to more materials for your data set, your institution needs to subscribe to more collections.

This type of product in general would be very helpful in teaching text data analysis at scale to non-programmers. I believe that humanities students can benefit from activities helping them to learn how to formulate hypotheses and evaluate evidence found in very large data sets. As individuals already receiving training in the critical evaluation of materials, they could make a valuable contribution to data-driven organizational activities in a number of fields. Put another way, employers of course need programmers able to build and adjust text-mining applications or sets of applications. But they also need critical thinkers to evaluate and results.

Access to a relatively limited number of text data sets is not a problem for this type of experiential learning, but it does present a large obstacle to original scholarly research. A paper making an argument based on the analysis of a data set that only contains those nineteenth-century text materials appearing in a ProQuest or Gale data set will very likely overlook a large part of the available historical record. Researchers need to be able to upload their own data sets into online text-mining services.

It is also my impression that the code and algorithms that do the data analysis for vendor-served text mining project remain proprietary, which means that researchers and collaborating programmers would be unable to download the code and customize it for their own use. Since in my experience effective text-mining often requires a great deal of adjustment and customization, this presents another problem.

Sales representatives for the above companies have made general statements about how their programmers were and are working on a function that would allow subscribers to upload their own data, but to my knowledge that has not happened. If any representatives of ProQuest, Gale, or other library vendors making similar products available have information to the contrary, please contact me and I will be happy evaluate your product.

I am very interested in Constellate because the ITHAKA representative with whom I spoke emphasized that their organization plans to present the service as A) able to analyze outside data sets, and B) willing to allow outside programmers to access its Python code for the purpose of customization. They hope to build a collection or set of open-source code applications that various Constellate users have constructed.

This would be a very promising situation for researchers, teachers and learners situated at R2 and smaller institutions lacking large financial resources.

I will spend the next few months working with Constellate and report on what I discover.

"Some Assembly Required: Low-Cost Digitization of Materials from Magnetic Tape Formats for Preservation and Access"

Earlier this year three colleagues and I published an article discussing the digitization of sound materials from magnetic tape formats.

Please find the abstract and a link to the journal below. It is my understanding that the individual article will be embargoed until March, 2022, so the link to the individual article itself probably will not work until then.

"Some Assembly Required: Low-Cost Digitization of Materials from Magnetic Tape Formats for Preservation and Access"

Preservation, Digital Technology, and Culture 49 (3) October, 2020, 89-98

https://www.degruyter.com/journal/key/PDTC/html

Sarah Cain, Brandon Welch, Annie Oelschlager, Drew VandeCreek February 10, 2021

Abstract

Recent work discussing the digitization and preservation of magnetic tape materials has maintained that it should be left to expert practitioners and that the resulting digital materials should be stored in digital repositories. This article suggests that librarians and archivists lacking extensive technical skills or access to expertise can digitize these materials themselves. It provides a detailed account, including challenges faced, of how a team of practitioners without prior training or experience digitized historical audio recordings on cassette and open reel tape at Northern Illinois University Libraries. The discussion reviews the assembly of equipment and software that the team used for digitization work, discussing each element’s significance and how they came together as a functioning workflow. The authors also emphasize the fact that while the digitization of fragile and/or degraded magnetic tape materials may contribute to the preservation of their contents, this action also creates a new set of materials with their own preservation needs. Realizing that many practitioners serving medium-sized and smaller institutions lacking large financial resources may not have access to a full-fledged digital repository, they suggest the use of the National Digital Stewardship Alliance’s Levels of Digital Preservation rubric as a means by which practitioners may incrementally increase the probability that digital materials made from magnetic tapes will remain accessible.

Friday, January 10, 2020

Where Are They Now?: Curation and Preservation of Early Online Digital Humanities Materials

I am currently doing research, in collaboration with my colleague Jaime Schumacher, on the present status of sixty-five online digital humanities projects funded by the National Endowment for the Humanities' Division of Education Programs Development and Demonstration competition in the period 1993-2005. This was a major source of early funding for projects in this field, providing support to the University of Virginia's Valley of the Shadow Project, the Perseus Digital Library at Tufts University, the Women and Social Movements Project at SUNY Binghamton, and many others.

I should note that I did not receive any funding from this program during this period, nor did any of my colleagues at Northern Illinois University.

Having experience in the creation of grant-funded, online digital humanities materials during this period as well as in the investigation of digital preservation issues in libraries and archives, I became aware that many of these online resources were likely at risk of loss. I know that we struggled mightily to devise a way to keep our online projects (Lincoln/Net, Mark Twain's Mississippi, Southeast Asia Digital Library) functioning and available, so it stood to reason that other practitioners and institutions in similar situations would do so as well.

Based on our own experience, I identified major threats to the preservation and online presentation of these projects as

1) the lack of long-term funding inherent in grant-funded work. Unlike discrete research projects commonly funded and performed in colleges, universities and cultural heritage institutions, which typically produce results and publish findings as part of a mutually agreed upon timeline, these projects proposed to make materials available to the public for an indefinite period of time. Who was to pay for their support after the grant period?

2) the demands of online presentation in light of technical infrastucture's limited lifespan in a rapidly changing technical environment. It became clear rather early in this period that computers used to serve websites must be replaced every three or four years in order to provide acceptable levels of online availability. Also, software companies continued to push new versions of software (and eventually stopped supporting old versions), and produced new products very quickly, leading to rapid obsolescence of software arrangements.

3) the widespread assumption among practitioners (including myself) that digital materials were in fact more durable, and hence less subject to loss than analog materials. This turned out to be false, as research has often shown that for reasons including those listed above, digital materials are very likely to be lost in the absence of detailed preservation policies and sustained attention to their curation. Thus the support of digital projects included much more than paying for their ongoing online availability. It also included organizing, maintaining, and securing the archive of digitized or born-digital materials that the project had created. The fact that practitioners responsible for the creation of digital projects often did not realize this considerable risk of loss exposed their materials to additional risk.

Our research will show how many of these online resources are still available online today; what technical platform (hardware and software) they employ, as well as the institutional arrangements behind this platform (i.e. is the project still presented online by the institution that received the original grant? if so, what part of the institution has assumed responsibility for it? and, is this the unit of the institution which originally received the grant?); and how many and which of these institutions have discussed and/or implemented a binding plan for their continued preservation and online presentation.

In addition, we hope to speak to representatives of the National Endowment for the Humanities to determine what their original expectations for projects' preservation and ongoing availability might have been, and if these expectations evolved or changed over the time period in question.

In the end, we hope to determine which factors have affected online availability of early digital humanities projects. At this point we believe that the dynamics mentioned above will very likely appear in this list of factors, but only the research itself will tell.

I'm sure that other research questions will occur to us as we review the data we have collected. We may or may not be able to integrate them into this discrete study.

We hope to publish the results of the study in approximately eighteen to twenty-four months, perhaps addressing the Library Science and Digital Humanities communities in separate articles that discuss our data and findings from their respective viewpoints.

Wednesday, February 21, 2018

Another Text-Mining Project: CIA Materials

This semester I am working with a team of four Northern Illinois University student interns to explore a large collection of text materials brought to us by Dr. Eric Jones of our university's Center for Southeast Asian Studies. The materials consist of the Central Intelligence Agency's President's Daily Briefings for the period 1961-1977, or roughly the period of the United States' military engagement in Vietnam, including the several years leading up to a following the war itself. These materials have been declassified and are available on the CIA's online reading room.

Two Northern Illinois University graduate students have expressed interest in working with President's Daily Briefing materials from this period in their dissertation research, but are unable to devote the time necessary to read this very large collection of documents without some knowledge of its contents.

To date, the student team has used a script to download the 5,292 individual daily briefings, and Optical Character Recognition to convert the documents, available in PDF format, into machine-readable text.

Text mining technology will allow the student intern to provide Dr. Jones and his students with an overview of the materials, including topics (combinations of words that frequently appear together) therein.

As the request for this information has come from students of twentieth-century Southeast Asian history and politics, we will especially focus on topics including the names of Southeast Asian nations, cities, geographical features, and public figures.

We will also provide a review of sentiment analysis (scoring their positive or negative character) of the Daily Briefings and attempt to group them into sets (or clusters) of like documents, based on the words contained therein.

Upon its completion, this work will provide NIU researchers with a new data set heretofore unavailable to them: the machine readable text of the President’s Daily Briefings for the period under consideration. The University Libraries may choose to make this data set available for future research via its digital repository.

The work will also provide NIU researchers with a detailed report of

A) topics appearing in the collection, showing how individual topics may become more or less prominent within the larger collection at different periods in time;

B) how the reports expressed positive or negative sentiments regarding national security concerns; and

C) and how the individual briefings relate to each other in terms of words used in common.

The project team will also share the machine-readable text data set and report of findings with other researchers by submitting it to an open-access Digital Humanities publication and/or data repository for the humanities and/or social sciences.

Tuesday, November 28, 2017

"Bleeding Kansas"

"Kansas" - from The United States Illustrated, 1855 | Mark Twain’s Mississippi |NIU Digital Library

This 1855 illustration depicts an idyllic scene in Kansas, most likely along the Kansas River. In this period Kansas was anything but idyllic, however. Kansas became a territory with the implementation of the Kansas-Nebraska Act on May 30, 1854. The act, which left occupants of the Kansas and Nebraska territories seeking statehood to decide if human slavery would exist in their jurisdiction, set aside the Missouri Compromise of 1820, by which Congress had sought to maintain a rough balance between slave and free states in the Union by restricting the former to territory south of the 36 30′ parallel, excluding Missouri itself. Pro- and anti-slavery settlers poured into Kansas, many with the explicit goal of establishing their preferred policy there. Political controversy ensued.
Pro-slavery settlers dominated the initial territorial legislature elected on March 30, 1855. This body would determine if Kansas would enter the Union as a slave or free state. Opponents of slavery around the Union argued that widespread voter fraud made the election’s results illegitimate, and the territorial governor invalidated results in several districts. New elections gave anti-slavery settlers greater representation, but they remained in a decided minority.

The United States Congress sent a special committee to Kansas, which concluded that the territorial legislature was an illegally constituted body without authority. The territorial legislature convened in spite of the finding, rejected the credentials of those who had won the new elections, and passed laws paving the way for Kansas to enter the Union as a slave state. Anti-slavery Kansans rejected this government and formed their own, which in January, 1856, President Franklin Pierce declared illegal. Violence broke out between pro- and anti-slavery settlers, resulting in the shooting death of a free stater near Lawrence in December of 1855.

Political controversy produced physical violence. On May 21, 1856, pro-slavery forces stormed Lawrence, destroying a hotel and two newspaper offices, and sacking homes and businesses. Republican Senator Charles Sumner of Massachusetts soon delivered a speech on the Senate floor depicting pro-slavery views and actions as akin to the rape of a virgin. Sumner’s speech especially singled out the South Carolina Senator Andrew Butler for criticism. The next day Butler’s cousin, the South Carolina Congressman Preston Brooks, attacked Sumner on the Senate floor with a cane, inflicting grave injuries.

In Kansas, the anti-slavery activist John Brown led his sons and other followers to murder five pro-slavery settlers at Pottawatomie Creek on May 24, 1856. On the Fourth of July President Pierce sent U.S. Army troops to remove the Free State Legislature at Topeka. In August pro-slavery forces burned the Free State town of Osawatomie, Kansas after driving off defenders led by Brown. The last major outbreak of violence occurred in the Marais des Cygnes massacre of 1858, in which pro-slavery forces killed five Free State men. In all, approximately 56 people died in “Bleeding Kansas” in the years before the Civil War began.