Tuesday, January 24, 2017

The Fragility of Digital History (at least as I practiced it)

My recent turn to projects emphasizing the curation and preservation of digital data, like that contained in the historically oriented web sites we have developed at Northern Illinois University Libraries, has led me to recognize the many ways in which these materials can become compromised or otherwise lost to use. Backing up materials is of course very important, but is not a cure-all. If we back up materials in formats that eventually become so obsolete that no available software can open them, the data is still lost.

I have also become aware of other, less obvious, factors that have compromised Lincoln/Net, Mark Twain's Mississippi, and several of our other web sites. We built these in the early 2000s, with available open-source technology (Linux/Apache/MySQL/PHP - commonly called a LAMP set-up). It allowed our sites to combine searchable archives of primary sources (mostly text and images, but also latter-day versions of primary source materials in different media) with original interpretive materials in an effective manner. We simply laid out the web sites on two perpendicular axes, with a bar presenting links to primary source materials running horizontally near the top of the page, and a bar presenting links to interpretive materials running vertically along the left edge.

Of course this approach had its drawbacks. As we built a series of websites, we found we had no way to manage them systematically, together. If we wanted to make changes to our sites, which generally had a similar look at feel, someone had to edit their code, individually, by hand. We also had no way of monitoring our data in order to verify its continuing viability, nor did we have a way of pushing our data, as a block, or a series of blocks, into a backup device. This became increasingly time consuming.

We decided to migrate all of our data to a new platform made up of Fedora Commons repository software, a Drupal web interface, and Islandora, a Drupal module that allowed them to interact. This proved very difficult in our context of limited resources, and today we run that combination of applications on a shoestring thanks to the efforts of one talented and dedicated librarian.

The move to the new platform made data management and curation much easier, but it also cost us something.

The Fedora/Drupal/Islandora stack functions on the assumption that those implementing it intend to make digital objects available online by search or browsing, and manage their collections in a coordinated manner. It allows the users and providers of data to do these things very well, much better than a LAMP set-up would. But it leaves little room for interpretive materials. Put another way, it reduces our interpretive materials from a place of considerable importance on our websites, presented as equally important as the primary sources, to a side light. Links to them appear in the tool bar running horizontally near the top of the page, but from my perspective they become just another type of available data. Uninitiated users have little reason to perceive that buttons labeled "Essays" or "Videos" lead to interpretive materials. The "Lesson Plans" button is certainly effective, however.

Why could we not adapt the technology to preserve our two-axis presentation?  To be brief, because a more sophisticated and  manipulable search interface occupies the entire left edge (approximately one-quarter of width) of the page. This is in many ways a good thing, as we provide increasingly knowledgeable and experienced user groups in educational institutions with the features they have come to expect.  It also makes it impossible to put anything else along the left edge of the page.

Why could we not simply invert our approach and present primary sources on the vertical axis and interpretive materials on the horizontal. Again, to be brief, because the more powerful search apparatus that we now use provides a preliminary ("faceted") level of access to the data it retrieves there, occupying the remainder of the page (below the horizontal tool bar) with access to individual resources. All other functions, including "browse" "home" and "about" reside on the horizontal bar, along with access to essays, videos, maps, and lesson plans.

To be clear, I am not complaining that my library forced me and my colleagues to use software that we don't like. I originally led the push to make the change to a new software stack. If we had retained our original interface, now nearly twenty years old, our web sites would have taken on the appearance of obsolescence. Despite the apparent superficiality, almost triviality, of this concern, I believe that experienced web users immediately assess a site's usefulness and legitimacy by its  appearance - the first impression it makes. I know I certainly do. If we had retained our original interface, we not only would have continued to limp along with web sites that remained difficult to administer and data unsuited to modern curation techniques, we also would have produced a first impression marked by obviously outdated technology.

So what am I saying? This: the technical platforms necessary to present a web site including searchable access to primary source materials in a sophisticated and credible manner today reflect the assumptions and priorities of the library and archives community. They emphasize providing access to data, period. Technological developments like those we have employed (faceted search, for example) make that increasingly easy and powerful.

These assumptions and priorities give little notice to matters of outreach and interpretative assistance. They are aimed at users who want to search data in order to reach their own interpretations, in an essay, a research paper, or a book.  These users presumably already have access to information helping them to understand the primary source materials via classroom instruction or interpretive works available elsewhere (other web sites, books, articles, etc.). These users especially exist in schools and on college/university campuses.

Our LAMP-based web sites tried to provide a user group that we presumed did not have ready access to these forms of interpretive material - members of the general public - with a chance to build an interpretive framework to inform their searches. Perhaps these individuals could have gone to a library and read interpretive works, but we attempted to use the web's immense reach and flexibility to make interpretive materials more readily accessible - online, right next to the primary sources, in text and video formats.

We still do this, but the recent improvements in online indexing and search technology have made it increasingly difficult and, I suspect, ineffective.

My colleagues and I developed our sites in the web's early days, before information professionals had had a chance to assess it and refine it for their purposes by the development of progressively more effective search and retrieval technology. They have given us a great deal. But something has been lost, too.

I do not blame librarians and archivists for this loss. They are not being shortsighted. They are simply doing their jobs, as they are defined by the conventions of their profession. These conventions are  worthwhile and to be applauded. I suspect that attempts to devise an interface accommodating my preferred two-axis approach would likely compromise the efficacy of the available search and retrieval technology in some way. Were it possible to design such an interface without negative trade-offs, I suspect that it would require a considerable amount of financial resources and technical expertise, which are seldom available in the present political climate.

Our present web sites do make interpretive materials available for use, albeit not in the precise manner I originally envisioned.

As a historian, I have come to understand the versions of Lincoln/Net, Mark Twain's Mississippi, and other web sites that we developed with LAMP technology as artifacts, expressions of their time, especially the available technology. You can still see them on the Wayback Machine.