Tuesday, November 28, 2017

"Bleeding Kansas"

Kansas, from The United States Illustrated, 1855 | Mark Twain’s Mississippi |NIU Digital Library
This 1855 illustration depicts an idyllic scene in Kansas, most likely along the Kansas River. In this period Kansas was anything but idyllic, however....

"Kansas" - from The United States Illustrated, 1855 | Mark Twain’s Mississippi |NIU Digital Library

This 1855 illustration depicts an idyllic scene in Kansas, most likely along the Kansas River. In this period Kansas was anything but idyllic, however. Kansas became a territory with the implementation of the Kansas-Nebraska Act on May 30, 1854. The act, which left occupants of the Kansas and Nebraska territories seeking statehood to decide if human slavery would exist in their jurisdiction, set aside the Missouri Compromise of 1820, by which Congress had sought to maintain a rough balance between slave and free states in the Union by restricting the former to territory south of the 36 30′ parallel, excluding Missouri itself.  Pro- and anti-slavery settlers poured into Kansas, many with the explicit goal of establishing their preferred policy there. Political controversy ensued.
Pro-slavery settlers dominated the initial territorial legislature elected on March 30, 1855. This body would determine if Kansas would enter the Union as a slave or free state. Opponents of slavery around the Union argued that widespread voter fraud made the election’s results illegitimate, and the territorial governor invalidated results in several districts. New elections gave anti-slavery settlers greater representation, but they remained in a decided minority.

The United States Congress sent a special committee to Kansas, which concluded that the territorial legislature was an illegally constituted body without authority.  The territorial legislature convened in spite of the finding, rejected the credentials of those who had won the new elections, and passed laws paving the way for Kansas to enter the Union as a slave state. Anti-slavery Kansans rejected this government and formed their own, which in January, 1856, President Franklin Pierce declared illegal. Violence broke out between pro- and anti-slavery settlers, resulting in the shooting death of a free stater near Lawrence in December of 1855.

Political controversy produced physical violence. On May 21, 1856, pro-slavery forces stormed Lawrence, destroying a hotel and two newspaper offices, and sacking homes and businesses. Republican Senator Charles Sumner of Massachusetts soon delivered a speech on the Senate floor depicting pro-slavery views and actions as akin to the rape of a virgin. Sumner’s speech especially singled out the South Carolina Senator Andrew Butler for criticism. The next day Butler’s cousin, the South Carolina Congressman Preston Brooks, attacked Sumner on the Senate floor with a cane, inflicting grave injuries.

In Kansas, the anti-slavery activist John Brown led his sons and other followers to murder five pro-slavery settlers at Pottawatomie Creek on May 24, 1856. On the Fourth of July President Pierce sent U.S. Army troops to remove the Free State Legislature at Topeka. In August pro-slavery forces burned the Free State town of  Osawatomie, Kansas after driving off defenders led by Brown. The last major outbreak of violence occurred in the Marais des Cygnes massacre of 1858, in which pro-slavery forces killed five Free State men. In all, approximately 56 people died in “Bleeding Kansas” in the years before the Civil War began.

Tuesday, November 7, 2017

Text Mining at an Institution with Lesser Financial Resources, Revisited

I am presently moving forward with a research program in text-mining at Northern Illinois University Libraries, but have encountered an unexpected obstacle.

About a year ago I ordered a copy of ProQuest's American Periodicals data set for local use. Our library subscribes to ProQuest's hosted version of this product, but the product's design/technical infrastructure does not allow text-mining activities and our license for its use prohibits the downloading of anything but the most insignificant amount of materials. When I contacted ProQuest about the matter, they informed me that I would need to pay an additional $1000 for the preparation and delivery of the entire data set (approximately five terabytes) to me. I could then use the data on my local infrastructure.

For the past two years I have worked with members of my university's Computer Science Department, principally providing graduate students in Data Science with access to relatively large humanities text data sets that I have created myself and questions that they may use to inform text mining activities. Prior to my purchase of the American Periodicals data set, I secured an agreement with that department whereby they would host the materials on their high-capacity computing cluster and make them available for ongoing Data Science research.  I would take delivery of the materials from ProQuest, then transfer them to the cluster for processing and future use.

I still do not have the data. The first six months or so of delays were the result of my Library's mistake in attempting to charge the expense for the materials to the wrong account. Once we resolved that, I struggled to get ProQuest to review my university legal department's proposed (unremarkable) revisions to the contract for several months. Upon resolving that, I was able to forward payment to ProQuest in August, and looked forward to the delivery of the materials.

At this point I learned that ProQuest expected to deliver the full data set to a server of my choosing via the Internet. Since my library does not have 5 TB of extra capacity readily available, I asked for the data to be delivered on a hard drive or hard drives. ProQuest agreed.

A month passed, and I heard nothing from ProQuest. My contact with the company asked me to bear with him as he had staff members absent from the office while on holiday. Another month passed, and after another inquiry I learned that the company reserves the right to deliver the materials on hard drive any time within a period of six months after payment. I see no mention of this reservation in my contract with the company.

It seems likely to me that ProQuest is accustomed to working with institutions large enough, and possessed of enough material resources, to take delivery of such a large data set in this manner quite easily. My institution does not fit that description. After a period of two years without any state support, we recently began to receive payments from the State of Illinois again. Needless to say, our digital infrastructure is far from robust.

If I had known that the delivery of this data by hard drive would prove to be such a difficult matter, I would have made the necessary arrangements with my university's Department of Computer Science to have the data delivered directly to their cluster via the Internet. As this is an inter-divisional matter within the university, it will take some time. I initially intended to take delivery of the American Periodicals materials as quickly as possible, leaving time to work out these arrangements.

But, alas, ProQuest's representatives raised no caveats about hard-drive delivery until I actually started to inquire about the whereabouts of the materials my university had purchased.

Thus my warning: if you are attempting to do text mining research at an institution that doesn't have five terabytes of storage immediately at hand, and want to work with ProQuest data, be aware that they will take up to six months to deliver your data.