Monday, November 3, 2014

Text mining for beginners

I am now Director of Digital Scholarship at Northern Illinois University Libraries. This means that it is now my job to work with faculty members seeking to employ technologies like Geographic Information Systems, text-mining and data visualization - helping those with little experience in such work find a way to put the technology to work.

As I really do not know very much at all about these activities, my job is now an exercise in learning something new. To this end, I have sought some help.

This fall I am working with a team of three Northern Illinois University students and their faculty coach, who will provide me with an evaluation of several open-source text-mining utilities, as well as a more general review of resources available for a scholar or other practitioner who might want to take up text-mining but lacks any experience in the work.

I spent last summer trying to identify and prepare text materials for their use in the evaluation of the utilities, and found very little information explaining how to begin a text mining project - i.e., finding digitized texts, selecting texts for research, and working them into a format suitable for use with the software - available.

I am looking forward to the students' report, and will try to bring their findings to the attention of historians and other humanities scholars who might be interested in text-mining.

Thursday, January 23, 2014

Digital/Online Materials and their Place in Historical Scholarship

At the recent meeting of the American Historical Association in Washington, D.C., I made a presentation as part of a discussion session (i.e., not a regular panel - we sat in a circle and talked after very short presentations made by people sitting as part of the circle) exploring digital materials, ranging from blogs and web sites to social media, and the questions that they raise as scholars begin to make use of them as primary sources. Other presenters talked about the future of MOOCs and crowd-sourcing the search for elusive information about a relatively obscure historical figure. I discussed the work of the Digital POWRR project and the challenges presented by the fact that digital objects are generally subject to loss in the relatively short term due to a number of reasons, including hardware and software incompatibility and the degradation of storage media.

One major question that emerged in the discussion was the status of social media materials and other online, digital sources in light of the fact that they are so prone to loss. One presenter at the preceding panel (our discussion group was part of a linked set of two events) described how she had based her work on Pakistani women in part on a web site that no longer existed, apparently because of hacking activities undertaken by parties believing that Pakistani women should not express themselves in this format. The presenter said that she had printed out the sites pages for her own record and thus could document her use of the source. But this made me wonder about the future practice of history.

So, what of digital sources like blogs, web sites, and social media objects like tweets? Digital objects' intrinsic frailty and the complex, easily disrupted nature of the internet used to present them make them fundamentally unreliable as primary sources, at least by the standards developed for the use of analog/paper media materials.  

It seems to me that although history is certainly not a science in any way, historians are similar to scientists in at least one regard. Much like a scientific discovery can only be accepted and confirmed as other practitioners are able to repeat the experiment and yield the same result, historians are accustomed to being able to lay their hands on a paper source cited in a footnote. Manuscripts are usually unique items, but if one travels to the archive and looks in the box and folder number cited, the item will be there. There may be a very small number of copies of a book, but if one is willing to make the trip to the right library, the book will be there. Historians will of course debate a scholar's reading of a source, but the existence of the source itself is fundamental to the discipline. If the item is not there, practitioners may rightly begin to ask questions about the legitimacy of a work citing it.

Many of the participants in the AHA discussion emphasized the need to preserve online digital materials as fully as possible. I certainly concur. But a whole host of problems, not the least of which is the considerable expense involved in the curation/preservation of digital materials, make this impossible. We will have to face that fact that a considerable amount of online digital objects that future historians may want to use as evidence will simply disappear. 

In this situation, several questions occur to me: How will we evaluate work citing online materials that are no longer existent? What if scholars relying on such missing evidence can produce a print-out or other facsimile of the materials? Can we distinguish cases of vanished evidence in which legitimate facsimiles exist from cases of academic fraud?












Wednesday, August 21, 2013

Learning About Digital Scholarship

After fifteen years devoted to the digitization and presentation of historical materials, I have been asked to investigate the emerging field of digital scholarship with an eye toward supporting at least several of its constituent activities on the Northern Illinois University campus. As a historian, I will begin with the digital humanities and attempt to work my way toward understanding digital scholarship in other disciplines.

This will certainly involve familiarizing myself with the considerable literature discussing digital scholarship in general, as well as the bodies of work discussing major subdivisions in it, like digital publishing, data/text mining, Geographic Information Systems and other forms of data visualization, and the retrieval of digital data via Application Programming Interfaces.

My task presents a challenge very much like that confronted by our present IMLS-funded study of how medium-sized and smaller institutions lacking large financial resources might achieve increasingly high levels of preservation for digital objects. From my present perspective, without the benefit of great familiarity with the field, it appears that successful digital scholarship and/or digital humanities programs at universities and colleges require significant amounts of resources. Proprietary software requires the payment of purchase/subscription fees. Open-source software requires the contributions of skilled programmers and developers. Both require the contributions of other skilled professionals familiar with their use in the different specialties making up digital scholarship and/or digital humanities, as well as their relevance to existing, more traditional scholarly discourses. These are luxuries that I have reason to believe my university, dependent for funding upon the worst-governed state in the nation, cannot presently afford.

Thus I will undertake my new work with an eye toward discovering ways in which members of the university community might produce digital scholarship with the least possible outlay of financial and other institutional resources. In these early days, I am planning to attempt to discover those faculty members on the NIU campus already doing digital scholarship of one type or another, in the hope that I might learn from them and enable them to learn from and support each other.

Monday, June 24, 2013

Overview of digital preservation tools

A chart containing brief descriptions of fifty-five digital preservation tools, including ingest and storage functions, can be found at http://digitalpowrr.niu.edu/tool-grid/

Monday, May 13, 2013

Information about Digital Preservation Tools

This week the Digital POWRR project staff has posted a large amount of information describing fifty-seven tools used in digital preservation activities. See http://digitalpowrr.niu.edu/tool-grid/. They include back-end storage providers and ingest/processing ("front end") utilities.

While a relatively small number of general, integrated front end applications like Archivematica and Curator's Workbench are currently available, individuals and institutions pondering a digital preservation initiative can also bring a number of ingest/processing tools together to assemble an ingest workflow suited to their specific needs.

As we found in the considerable amount of time required to review each of these tools and its capabilities, accumulating the knowledge necessary to make informed decisions in this matter can be quite a challenge. Hopefully, our list of available tools can help to shorten the amount of time and effort required.

In the coming year the project collaborators will test and review two back-end solutions, DuraSpace and Meta-Archive, and front-end utilities Archivematica and Curator's Workbench. 

Friday, September 21, 2012

Web site/app review: Historypin

Historypin is an online resource (www.historypin.com) presenting a wide array of user-submitted photographs, videos, audio clips, and stories and  in a geo-spatial format using Google Maps. Founded in 2010, the Historypin web site and mobile phone apps present materials principally organized by their relationship to specific locations. For example, photographs of an event taking place in my home town of DeKalb, Illinois would be available via a "pin" (link) appearing at DeKalb's location on a Google Map.

Photographs appear to be much more common than other types of materials.

A variety of individuals and institutions have submitted materials to Historypin. These include individuals posting items relating to weddings, birthday parties, reunions, etc., - as well as museums and archives submitting materials from their collections.

The Historypin interface allows users an opportunity to search for materials by date and keyword via the main map interface. For example, a search using the keyword "soccer" revealed a photograph collection from St. Louis University, dated 1965, relating to that institution's soccer team, as well as 1975 photographs of Los Angeles Mayor Tom Bradley and the Brazilian soccer player Pele. 

Many materials in Historypin may be of interest to historians, but the site will likely prove frustrating. All of the resources discovered by my limited review presented very limited metadata/descriptive information. Materials  submitted by archives, museums, and professionally-staffed institutions generally included (in my review of available materials) more information, but generally less than one would hope to find in a visit to an archive, museum, or library itself. Resources submitted by individuals or non-professional groups generally included very little or no metadata, aside from a title and perhaps a date.


The archive/museum/library materials that I examined often were accompanied by contact information for their home institution, which could enable an interested scholar to track down additional information via email. But my review of individual submissions revealed no such contact information for individual contributors. These are then materials presented in geographical and (usually) temporal context, but lacking virtually all other types of helpful information.

I also wondered about the long-term status of materials appearing on Historypin. Might an institution or individual adding a single item or collection to Historypin withdraw those materials at some future date? If a scholar or other researcher wishes to refer to those materials in a publication, blog post, or in-person presentation, how can s/he be sure that they will still be available in the future? What if Historypin goes out of business or experiences a catastrophic technical failure.

These questions affect a scholar's willingness to use/cite materials found via Historypin in a publication or other type of presentation requiring formal documentation. 

In the end Historypin is a positive development in that it brings a wealth of historical materials to the attention of the vast public using the web. In this regard it is sure to stimulate historical thinking and discussion on a number of levels. But its emergence raises a number of problems and questions for users interested in more than casual browsing of available resources.


Tuesday, June 12, 2012

Explaining Ourselves

The present political and economic climate for higher education, including history and the humanities, is very ominous. State-supported institutions, like the one at which I work, have been absorbing significant budget cuts from hard-pressed state governments for years.

The federal government's ongoing reckoning with a public apparently unwilling to authorize the collection of additional tax revenues, while demanding increasing funds for entitlement programs, stands to make this situation even worse.We have witnessed this at Northern Illinois University Libraries, where the March 2011 budget deal hammered out by the president and the Congress, which reduced funds available to the U.S. Department of Education's Title VI program by 40%, led to the loss of our Southeast Asia Digital Library project. Eleven other institutions supported in similar international education digitization projects also suffered a complete loss of funds. I would imagine that amidst current discussions of the federal deficit and debt ceiling, agencies like the National Endowment for the Humanities and the Institute of Museum and Library Science are bracing themselves for additional cuts.

This is not only about state and federal agencies' increasing inability to support history, the broader humanities, and higher education in general. It is also about the cost of higher education, which is presently increasing at an unsustainable rate. Pay Pal founder and billionaire Peter Thiel has argued that a college education is a bubble and a bad investment. A recent publication in The Atlantic has described higher education as  "an industry that has largely ignored cost efficiency and scalability."

The purpose of this post is to argue that in this climate, historians and humanists in higher education need to provide policymakers and the public with a clear explanation of why we are deserving of support. One's first instinct may be to recoil from such a direct request. Isn't the value of history, the humanities, and higher education in general self-evident? As scholar-professionals, aren't we above the dirty work of currying public favor, especially in a cultural climate marked by considerable hostility to science, the historical record, and higher education in general? Well, apparently not. In a climate marked by shrinking resources and increasing demands upon them, we need to explain ourselves. Everyone hoping to receive support from any government does. Those who ignore this debate will risk being left behind in one way or another.

Historians, humanists, and other representatives of higher education should use the web to reach the public in new ways. State-supported institutions of higher education are generally vigilant about communicating with legislators who provide funding, documenting what that money produces. But I would argue that individual teacher/scholars need to appeal to the public directly. Right now, to my knowledge, historians and other humanists do a bad job of this. If anyone actually reads this blog, I may well receive messages saying "I give public lectures!;" or "What about public television?;" or, "What about museums?." These are certainly not inconsequential efforts and institutions, but they represent an outreach mechanism that has changed little in at least 30 years.They do not seem to have been effective in building broad-based popular support for history and the humanities.

Many historians, humanists, and other academics use the web for research - accessing journals and collections, and communicating with each other via services like H-Net. But there is at present little attempt to use the medium to reach the public with a discourse aimed at non-professionals.

Starting this discourse can begin when historians and other scholars use the web to provide the public with brief discussions of their interests, findings, and methods.A short paragraph added to an individual faculty member's departmental web page is a start, but I believe short video clips of approximately 2-3 minutes can allow individual teacher/scholars to reach a wider audience. I can imagine that most practitioners might respond by saying "My work is so specialized that a lay audience could never understand it." To my mind, this was an assumption applied to the web more broadly in its early days. Few people thought that substantial communities would emerge online for the discussion of esoteric subjects like the history of eugenics  or architectural stained glass. I would argue that if academics were to discuss their research online, making an effort to use accessible terminology, they might be surprised at the attention they attract.

I have of course touched on this general subject in previous posts to this blog, arguing that the projects that I've developed at Northern Illinois University Libraries are an attempt to bring historians' findings, and their methods, to a general online audience. Today, I argue that the tremendous pressure for universities to cut costs makes it incumbent upon every scholar and academic department to demonstrate their unique value.

To my mind, there are two related questions that need to be addressed here: the value of in-person university teaching and the value of faculty members' scholarly research.  I will first consider the issue of university teaching and learning.


The Atlantic has outlined a vision of centralized instruction in higher education, in which online courses taught by superstar academics become a part of the curriculum at many smaller, less-prestigious institutions. This initiative does not propose to do away with local instructors completely. Rather, on-campus relations would be managed by personnel acting in a capacity quite similar to that of teaching assistants in a lecture course at a large research university. Institutions partaking of this type of centralized instruction would reap cost savings by retaining these instructors at a lower rate of pay than that presently provided instructors and dramatically reducing the number of faculty with an opportunity to earn tenure.

I believe that, while this vision may seem utterly dystopian to most members of the academy, it needs to be taken seriously. This is not to say that I embrace it. Rather, I want to emphasize that dismissing it out of hand will not make it go away. Institutions of higher education, and particularly non-elite public institutions, already face growing pressure to cut costs and reduce the rate of tuition inflation. These are likely to increase.


Historians and humanists, like most other professors, have long argued that knowledge is best transferred by face-to-face instruction.  As a graduate of small liberal-arts college, I certainly believe that this is true. But in practical terms, the rise of huge lecture courses in which instructors only lecture and hold limited office hours, leaving most direct student contact to teaching assistants, has already devalued it in many institutions. The rapid growth of online-only higher education like that provided by the University of Phoenix has also provided administrators and public officials with an example of low-cost teaching and learning. The pressures for what The Atlantic has called "efficiency and scalability" are immense and, in the context of non-elite institutions, may be irresistible. Historians and other academics' challenge is not to hold the fort and throw back the forces of distance learning and cost-efficiency. Rather, it is to gain the political traction necessary to exercise some influence over the shape that these developments may take.Historians, humanists, and other academics can attempt to gain this traction by discussing the broad themes, presumably manifest in the historical literature, that they present in individual classes.

So, what of research? As we all know, scholarship is a collaborative enterprise. In the historical profession, superstar scholars write syntheses from the raw material provided by countless monographs. If the number of teacher/scholar positions available were to decrease by, say, forty percent over the next twenty years, the work of producing these monographs would slow down dramatically. As a scholar, I would mourn this development. But I don't believe that we can expect much sympathy from the general public on this front. Rather, it is incumbent upon scholars to demonstrate how they do historical research, and how their individual projects touch the lives of non-historians. A simple emphasis on how historical understanding grows from an analysis of the existing historical record would make an outstanding contribution to a public discourse that often seems unaware of this fact. When pressed, many historians argue that our work contributes to an historical consciousness that in turn facilitates good citizenship. We could easily begin to engage the public by making this argument online, showing how specific pieces of evidence shed light on the past and our present circumstances.

I do not believe that the introduction of new priorities and arrangements in higher education will be a uniform process. Rather, it will be subject to many of the same political pressures that affect other public policy decisions. If scholars were to describe their research in an online format, as in the case of the short video segments I have proposed (and hopefully in other, far more creative ways), they could attract the interest of various members of the public. With further effort they could create constituencies for themselves and become a part of the process of interest-group politics. These allies could help to remind administrators and legislators of  historians and other humanists, as well as their universities', real value.

A recent piece in The Chronicle of Higher Education has reminded readers how academics seeking to bring their work to a public audience can face negative consequences in a tenure and promotion process often singularly focused on research. My proposal is clearly at odds with this widely-recognized aspect of academic life. A focus on research  may have been appropriate in a Cold War era marked by the American public's apparent willingness to support universities in the name of national competitiveness, prestige, and defense. But this approach is problematic in a political context marked by increasingly sharp criticisms of higher education. Many universities have become increasingly adept at documenting how they contribute to local and national economic development, but historians and other humanists have largely failed to make a case for their contributions to a broader common good. Individual scholars' online discussions of their teaching and research can begin to fill this gap, but I would argue that university administrators need to embrace and encourage faculty outreach as an activity contributing to an institution's future viability and survival.

This is not a sunny vision of the future in which, if everyone explains their research to the public, every university and every department gets to keep all of its budget lines. I have not discussed how academics, often lacking in technological skills, would produce videos discussing their work, or other such materials, much less present them on the web. It will of course be much easier for those at wealthy institutions to find the technical support necessary to do so. Also, it will likely be much easier for historians of the American Civil War to make online allies than historians of, say, medieval women. Many scholars' findings may arouse the ire of culture warriors who wish that the historical record, or fossil record, were otherwise. Nevertheless, I suggest that those who can use the web to make their work interesting and accessible to the public, and attach themselves to the types of interest communities that the web seems to spawn, may have a better chance to survive the present and future shake-out in higher education.