Tuesday, June 20, 2017

The Battle of Monterrey and the Mexican-American War


The Battle of Monterrey, an encounter in the Mexican-American War, took place September 21-24, 1846.  Approximately 6.500 United States troops under the command of Gen. Zachary Taylor attacked the fortified city, which was defended by 10,000 Mexican soldiers led by General Pedro de Ampudia. Taylor approached the city from the east, and sent another contingent on flanking movement to the southwest to block the Mexican troops’ escape route. Pinned in, Ampudia on the evening of the 22nd directed his troops to take up defensive positions within the city of Monterrey. On the 23rd the two armies engaged in house-to-house combat there.  As the U.S. troops prepared for a new assault on the next morning, Ampudia moved to surrender, and Taylor allowed him and his troops to leave the city.

The United States’ armed conflict with Mexico largely emerged from Americans’ eagerness to expand their nation westward to the Pacific Ocean. As American trappers and settlers poured across the Great Plains, many began to resent the fact that lands to the south and west of the Louisiana Purchase tract remained territories of Mexico, which had freed itself from Spanish colonial control in 1821. Americans’ persistent attempts to settle these lands led to conflict with the Mexican government and, eventually, war.

The Mexican Republic had welcomed Americans to settle in their northern territory of Texas in the 1820s, but after a decade it became plain that the Americans disliked Mexican rule. In 1835 the American settlers revolted against Mexico and, in the following year, established their own Republic of Texas. Many Americans immediately began to demand that their nation make Texas a part of the United States. The Mexican government warned that this would mean war.

In 1844 Americans elected James K. Polk as the nation’s new president. Polk had campaigned on the issue of national expansion, calling for the annexation of Texas, Mexican California, and the Oregon Territory that the United States and Great Britain had occupied jointly since 1818. Just before leaving office in early 1845 President John Tyler, a Virginian seeking to provide a new area into which slavery might expand, secured a joint resolution from Congress annexing Texas to the United States. Mexico responded by breaking off diplomatic relations.

Upon taking office President Polk immediately turned to the acquisition of Mexico’s northern territories. He first instructed his minister to Mexico to negotiate for the purchase of the territories, but this proposal sparked a wave of indignation and nationalist fervor in Mexico, and the minister left Mexico after only a few months.

Angry that Mexico had rebuffed his offer, Polk sent U.S. troops under the command of General Zachary Taylor to the Rio Grande River in January of 1846. Mexican officials believed that the Texas-Mexico frontier stood one hundred miles to the north, at the Nueces River, and interpreted Polk’s move as a deliberate provocation. Mexican troops quickly arrived at the Rio Grande as well, and skirmishes broke out between the two forces. Polk leaped to argue that “Mexico… has invaded our territory and shed American blood upon American soil.” Congress quickly provided him with a declaration of war.

In 1845 an American editor wrote that the American annexation of Texas represented the “fulfillment of our manifest destiny to overspread the continent allotted by Providence for the free development of our yearly multiplying millions.” By 1846 newspapers across the country had appropriated the term “manifest destiny” in their attempts to show that God intended the American nation to stretch from Atlantic to Pacific.

The United States’ decisive victory in the Mexican War added some 500,000 square miles of new territory to the nation. These lands included Texas, as well as the Mexican territories of New Mexico and Upper California. Eventually they would become the American states of California, Arizona, and New Mexico, and comprise significant parts of Utah, Colorado, Nevada, and Wyoming. Their acquisition intensified debate over the question of slavery’s future in the West: would slaveholders be able to take slave property into the acquired territories and establish a slave economy there? Would the new states that emerged from the territory won from Mexico be slave or free?

Friday, June 16, 2017

James B. Weaver and Populism


James B. Weaver ( June 12, 1833–February 6, 1912) was the People’s (or Populist) Party candidate for President of the United States in 1892. He was born in Dayton, Ohio and lived in Michigan as a child before his family settled near Bloomfield, Iowa in 1833.  Practicing law, he took an early interest in politics, and attended the 1860 Republican National Convention that nominated Abraham Lincoln. He served as an officer in the Civil War. Unsuccessful in Iowa politics and unhappy with the Republican Party’s stance on issues of concern to Iowa farmers, Weaver joined the new Greenback Party in 1877. In the following year he won election to Congress. In 1880 he became the party’s presidential nominee and won over 300,000 votes - about 3.3% of those cast.

When the Greenback Party dissolved Weaver joined the Populist Party, an organization that took up the Greenbackers’ call for in increased money supply, and added a broader agenda emphasizing agrarian reform. In 1892 Weaver won the Populists’ presidential nomination, campaigning on a platform that called for unlimited coinage of silver, an income tax, an eight-hour work day, and government ownership of railroads. He campaigned nationwide, accompanied by his wife Clara, and often the Kansan Mary Elizabeth Lease. He gained over one million votes - 8.5 percent of the total. He won the states of Kansas, Colorado, Nevada and Idaho outright, and also collected delegates in North Dakota and Oregon. In 1896 the Populists merged with the Democratic Party and nominated William Jennings Bryan of Nebraska for the presidency. Weaver supported Bryan. In a campaign that revisited many of the issues considered in 1892, the Republican William McKinley won election by a decisive margin.

This image is available on the Illinois During the Gilded Age web site and the NIU Digital Library

See American Populism, 1876-1896 for a fuller discussion of the Populist movement and its politics. 

Thursday, June 15, 2017

Mary Elizabeth Lease - He Shall Be Rescued from Such a Fate


Created by B.M. Justice, this illustration from Mary E. Lease’s The Problem of Civilization Solved paints a grim picture of life in the Gilded-Age United States. It shows a single individual besieged by attacking dogs (or wolves) and a bird of prey. Lease was a vocal advocate of the Populist Party and critic of the period’s vogue of Social Darwinism and laissez-faire individualism, a sentiment that the above illustration captures vividly. Yet she was not an advocate of government provision of social welfare payments or benefits, which she derided as “class legislation.” Rather, she advocated a government program removing “the deserving poor, the honest men and women who are willing to work but to whom work has been denied,” from large cities and settling them in the countryside. “They can be rescued from their poverty,” she concluded. (373)

She wrote: “Obese satiety elbows starvation it every turn along our streets. The tide of pauperism is steadily rising and we are rapidly approaching the condition of Europe in the last century. Class legislation has done much to swell the list of America’s paupers, but Europe’s system of dumping its pauperized class upon our shores has done more. An ever-increasing swarm of dependents are with us. The cause can be traced to class legislation and militarism. The one the curse of our free institutions and the other the bane of European civilization. The remedy lies, not in doling out alms to humanity until the recipients of charity become chronic beggars, but in first removing the cause of extreme poverty by giving every toiler access to the soil, making the ballot the key to unlock the garner where his birthright lies.” (5)

Lease attacked laissez-faire economics from a humanitarian perspective, and linked it to the Populist agenda: “Then let all who love mankind more than millionaires unite for the common welfare. We will introduce the initiative and referendum, nationalize our railroads and labor saving machinery, issue paper money redeemable by taxation and remonetize silver.” (374) Yet she ultimately addressed the issue of urban poverty from the point of view of utility, efficiency and administration, and even raised the issue of eugenic measures: “Love and goodness, backed by the strong force of the state, must go down into the dens where the human wild beasts of society hide from the light of day, and empowered by that wise legislation that removes the leper or prevents the smallpox patient from contaminating his fellow beings remove the social Huns of the cities to lands set aside and purchased by the government for their use, subjecting them to such medical inspection and treatment as will check the reckless propagation of criminals and devitalized humanity. The pauperized class should be given an opportunity to work out their own fortunes under favoring conditions. Our first care should be to send them out under supervision of agents who could supervise large plantations, the tillage of which could be overseen and made profitable for them; having all their work planned for them by the agent, they would in time learn thrift and business capacity. Eventually they would become proprietors, reaping the incentive of all labor, just remuneration. The purchase of lands, medical inspection and government agencies would cost the state less than the never-ending expense now entailed for inadequate police protection arid the erection and equipment of buildings that are constantly over-filled by a constantly increased army of criminals. Stem the current of corrupt humanity by removing the fount from which it flows, make the vicious and idle dependent upon their own efforts with the incentive of compensation, all the compensation that life holds if they succeed and the alternative of annihilation if they fail to put forth honest effort when the helping hand is extended, for while God was severe in his denunciations of those who oppress the laborer he was none the less severe in his denunciation of the idler. `If a man shall not work neither shall he eat.’“ (371-2)

Lease provided a fascinating vision of a powerful, administrative state in America, yet it was one informed by the classical liberal tradition that she sought to critique.

All quotations are from Mary Elizabeth Lease The Problem of Civilization Solved (Chicago: Laird and Lee, 1895)



Illinois During the Gilded Age Populist Party Mary Elizabeth Lease

Wednesday, June 14, 2017

Granger at the Plow, 1873


This image is an illustration in Stephen Smith’s Grains for the Grangers, Discussing All Points Bearing Upon the Farmers’ Movement for the Emancipation of White Slaves from the Slave-Power of Monopoly (Philadelphia: John E. Potter, 1873), a political tract written in support of the Granger Movement.  Oliver H. Kelley of Minnesota founded the Order of Patrons of Husbandry, also known as the Grange, in 1867. The organization admitted men and women, an unusual practice in that time, and sought to provide often-isolated farm families with opportunities for social interaction. It also encouraged more productive farming through the distribution of scientific information. The organization grew rapidly  beginning in 1873 due to an economic depression. Many farmers attributed their struggles to the workings of the railroads, on which they relied to deliver their crops to market, as well as merchants and other middlemen. Many Grangers promoted state regulation of railroad shipping rates. Their success led to a well-known case heard by the United States Supreme Court, Munn v. Illinois (1877), which upheld the State of Illinois’ regulation of rates. Grangers often experimented with cooperative marketing organizations in an attempt to circumvent middlemen in the marketplace as well. These efforts proved ineffective, and many states soon repealed laws regulating railroad rates. Although the Grange did not prove to be a wholly successful political movement, it established a precedent for cooperative organizations in rural America and served as a predecessor for the Populist Movement of the 1880s and 90s.

The above information is drawn in part from Thomas Burnell Colbert’s essay on the Grange Movement in the Encyclopedia of the Great Plains.

Tuesday, June 13, 2017

Guarding the Cornfields, 1854, by Seth Eastman

Seth Eastman served in the US Army, including two tours at Fort Snelling in the Minnesota Territory (near present-day St. Paul). During his second tour there, he was the fort’s commanding officer. While stationed at the fort, he painted a number of scenes of Native American life in the region. In the above scene, Native Americans use noise-making devices to frighten crows and other birds away from their corn fields. Eastman contributed hundreds of illustrations, including this one, to Henry Rowe Schoolcraft's six-volume study on History of Indian Tribes of the United States (1851–1857).

This image appears on the Lincoln/Net web site.


Friday, May 12, 2017

Stephen Douglas, the Little Giant






The Little Giant in the character of a Gladiator | Abraham Lincoln Historical Digitization Project | NIU Digital Library
This cartoon, dating from the late 1850s or 1860, depicts Illinois Senator Stephen Douglas, popularly known as the Little Giant,...


Image appears in Lincoln/Net, courtesy of the Chicago History Museum

This cartoon, dating from the late 1850s or 1860, depicts Illinois Senator Stephen Douglas, popularly known as the Little Giant, as a Roman gladiator armed with his doctrine of Popular Sovereignty. In Douglas’ usage, Popular Sovereignty suggested that citizens of territories seeking to become states should determine for themselves if slavery would be permitted there. This proved very controversial in the northern states because the Missouri Compromise of 1820 had forbidden slavery in territory acquired in the Louisiana Purchase located north of the 36 30′ (with the exception of Missouri). Douglas’ proposal potentially threw the entire West open for slavery, and served to intensify the sectional crisis that led to the Civil War.

See in NIU Digital Library

The Haymarket Riot


 “The Haymarket Riot. The Explosion and the Conflict“ by W. Ottman, 1889 | Illinois During the Gilded Age | NIU Digital Library
On the evening of May 4, 1886, an unknown individual lobbed a dynamite bomb into a formation of Chicago police officers...

The above image is a contemporary artist’s imagining of the moment of the bomb’s explosion, found in Anarchy and Anarchists: A History of the Red Terror in America and Europe by Michael Schaack (Chicago: F.J. Schulte and Co., 1889). It appears in Illinois During the Gilded Age.

On the evening of May 4, 1886, an unknown individual lobbed a dynamite bomb into a formation of Chicago police officers sent to disperse an anarchist meeting in Chicago’s Haymarket Square. The panicked police responded with a hail of gunfire directed into the crowd attending the meeting. When order once again prevailed, seven police officers and at least that many private citizens lay dead, with many more wounded. These events touched off a wave of civic upheaval as Americans discussed the Haymarket bomb in light of the period’s rapidly changing economic and social conditions. It also led to a celebrated trial of eight avowed anarchists, the execution or death in prison of five of them, and Illinois Governor John Peter Altgeld’s bold pardon of the remaining three.

See in NIU Digital Library 

Owen Lovejoy



Owen Lovejoy | Abraham Lincoln Historical Digitization Project |NIU Digital Library
Owen Lovejoy (January 6, 1811 – March 25, 1864) was a Congregationalist minister and abolitionist who won election to the United States Congress in 1856. In an 1859...

Image: Northern Illinois University Libraries

Owen Lovejoy (January 6, 1811 – March 25, 1864) was a Congregationalist minister and abolitionist who won election to the United States Congress in 1856. In an 1859 speech to the House of Representatives, he declared his opposition to the Fugitive Slave Act, a federal law that required all Americans to assist in the capture of escaped bondsmen, in the following terms:
“Proclaim it upon the house-tops! Write it upon every leaf that trembles in the forest! Make it blaze from the sun at high noon and shine forth in the radiance of every star that bedecks the firmament of God. Let it echo through all the arches of heaven, and reverberate and bellow through all the deep gorges of hell, where slave catchers will be very likely to hear it. Owen Lovejoy lives at Princeton, Illinois, three-quarters of a mile east of the village, and he aids every fugitive that comes to his door and asks it. Thou invisible demon of slavery! Dost thou think to cross my humble threshold, and forbid me to give bread to the hungry and shelter to the houseless? I bid you defiance in the name of my God.”

See NIU Digital Library

"The Last Refuge" by Thomas Cole




Image from Lincoln/Net, courtesy of Newberry Library

This 1855 engraving of Thomas Cole’s “The Last Refuge” depicts a Native American man pursued to the top of single pillar of rock in the wilderness, his “last refuge” from the encroachment of American settlement. Although all American citizens contributed to this dynamic to some degree, a significant number, especially Whigs in the urban North, regretted its impact on Native Americans. Hoping that their country would devote its energies to the more intensive development of territory east of the Mississippi River, or even east of the Appalachian Mountains, they associated rapid western settlement with the spread of cotton agriculture, slavery, and an American future as an agricultural nation dependent upon industrial Britain to buy its raw materials. They also feared it would undermine Christianity’s influence on Americans’ lives, especially those living on the frontier. 


See NIU Digital Library










US Gunboat Cairo






 niudl:
“ U.S. Gunboat Cairo - Courtesy of Tulane University Libraries Robert M. Jones Steamboat Collection | Mark Twain’s Mississippi Project | NIU Digital Library
“Cairo, an ironclad river gunboat, was built in 1861 by James Eads and Co., Mound...



Image appears in NIU's Mark Twain's Mississippi Project, courtesy of Tulane University Libraries' Robert M. Jones Steamboat Collection


 “Cairo, an ironclad river gunboat, was built in 1861 by James Eads and Co., Mound City, Ill., under an Army contract; and commissioned as an Army ship 25 January 1862, naval Lieutenant James M. Prichett in command.”
“Cairo served with the Army’s Western Gunboat Fleet, commanded by Flag Officer A. H. Foote, on the Mississippi and Ohio Rivers and their tributaries until transferred to the Navy 1 October 1862 with the other river gunboats. Active in the occupation of Clarksville, Tenn., 17 February 1862, and of Nashville, Tenn., 25 February, Cairo stood down the river 12 April escorting mortar boats to begin the lengthy operations against Fort Pillow, Tenn. An engagement with Confederate gunboats at Plum Point Bend on 11 May marked a series of blockading and bombardment activities which culminated in the abandonment of the Fort by its defenders on 4 June.”
“Two days later, 6 June 1862, Cairo joined in the triumph of seven Union ships and a tug over eight Confederate gunboats off Memphis, Tenn., an action in which five of the opposing gunboats were sunk or run ashore, two seriously damaged, and only one managed to escape. That night Union forces occupied the city. Cairo returned to patrol on the Mississippi until 21 November when she joined the Yazoo Expedition. On 12 December 1862, while clearing mines from the river preparatory to the attack on Haines Bluff, Miss., Cairo struck a torpedo and sank.” – Dictionary of American Naval Fighting Ships.



See NIU Digital Library

A New Type of Post

This spring I have begun posting individual images from Northern Illinois University Libraries' Digital Library, principally from digital projects exploring American history and culture that I have developed, to the Digital Library's TUMBLR account. I typically offer a few words of explanation or analysis to accompany the image.

I post materials for one week out of every month, every day of that week. My colleagues and I have agreed to re-post materials via our own blogs, so here goes...


Tuesday, January 24, 2017

The Fragility of Digital History (at least as I practiced it)

My recent turn to projects emphasizing the curation and preservation of digital data, like that contained in the historically oriented web sites we have developed at Northern Illinois University Libraries, has led me to recognize the many ways in which these materials can become compromised or otherwise lost to use. Backing up materials is of course very important, but is not a cure-all. If we back up materials in formats that eventually become so obsolete that no available software can open them, the data is still lost.

I have also become aware of other, less obvious, factors that have compromised Lincoln/Net, Mark Twain's Mississippi, and several of our other web sites. We built these in the early 2000s, with available open-source technology (Linux/Apache/MySQL/PHP - commonly called a LAMP set-up). It allowed our sites to combine searchable archives of primary sources (mostly text and images, but also latter-day versions of primary source materials in different media) with original interpretive materials in an effective manner. We simply laid out the web sites on two perpendicular axes, with a bar presenting links to primary source materials running horizontally near the top of the page, and a bar presenting links to interpretive materials running vertically along the left edge.

Of course this approach had its drawbacks. As we built a series of websites, we found we had no way to manage them systematically, together. If we wanted to make changes to our sites, which generally had a similar look at feel, someone had to edit their code, individually, by hand. We also had no way of monitoring our data in order to verify its continuing viability, nor did we have a way of pushing our data, as a block, or a series of blocks, into a backup device. This became increasingly time consuming.

We decided to migrate all of our data to a new platform made up of Fedora Commons repository software, a Drupal web interface, and Islandora, a Drupal module that allowed them to interact. This proved very difficult in our context of limited resources, and today we run that combination of applications on a shoestring thanks to the efforts of one talented and dedicated librarian.

The move to the new platform made data management and curation much easier, but it also cost us something.

The Fedora/Drupal/Islandora stack functions on the assumption that those implementing it intend to make digital objects available online by search or browsing, and manage their collections in a coordinated manner. It allows the users and providers of data to do these things very well, much better than a LAMP set-up would. But it leaves little room for interpretive materials. Put another way, it reduces our interpretive materials from a place of considerable importance on our websites, presented as equally important as the primary sources, to a side light. Links to them appear in the tool bar running horizontally near the top of the page, but from my perspective they become just another type of available data. Uninitiated users have little reason to perceive that buttons labeled "Essays" or "Videos" lead to interpretive materials. The "Lesson Plans" button is certainly effective, however.

Why could we not adapt the technology to preserve our two-axis presentation?  To be brief, because a more sophisticated and  manipulable search interface occupies the entire left edge (approximately one-quarter of width) of the page. This is in many ways a good thing, as we provide increasingly knowledgeable and experienced user groups in educational institutions with the features they have come to expect.  It also makes it impossible to put anything else along the left edge of the page.

Why could we not simply invert our approach and present primary sources on the vertical axis and interpretive materials on the horizontal. Again, to be brief, because the more powerful search apparatus that we now use provides a preliminary ("faceted") level of access to the data it retrieves there, occupying the remainder of the page (below the horizontal tool bar) with access to individual resources. All other functions, including "browse" "home" and "about" reside on the horizontal bar, along with access to essays, videos, maps, and lesson plans.

To be clear, I am not complaining that my library forced me and my colleagues to use software that we don't like. I originally led the push to make the change to a new software stack. If we had retained our original interface, now nearly twenty years old, our web sites would have taken on the appearance of obsolescence. Despite the apparent superficiality, almost triviality, of this concern, I believe that experienced web users immediately assess a site's usefulness and legitimacy by its  appearance - the first impression it makes. I know I certainly do. If we had retained our original interface, we not only would have continued to limp along with web sites that remained difficult to administer and data unsuited to modern curation techniques, we also would have produced a first impression marked by obviously outdated technology.

So what am I saying? This: the technical platforms necessary to present a web site including searchable access to primary source materials in a sophisticated and credible manner today reflect the assumptions and priorities of the library and archives community. They emphasize providing access to data, period. Technological developments like those we have employed (faceted search, for example) make that increasingly easy and powerful.

These assumptions and priorities give little notice to matters of outreach and interpretative assistance. They are aimed at users who want to search data in order to reach their own interpretations, in an essay, a research paper, or a book.  These users presumably already have access to information helping them to understand the primary source materials via classroom instruction or interpretive works available elsewhere (other web sites, books, articles, etc.). These users especially exist in schools and on college/university campuses.

Our LAMP-based web sites tried to provide a user group that we presumed did not have ready access to these forms of interpretive material - members of the general public - with a chance to build an interpretive framework to inform their searches. Perhaps these individuals could have gone to a library and read interpretive works, but we attempted to use the web's immense reach and flexibility to make interpretive materials more readily accessible - online, right next to the primary sources, in text and video formats.

We still do this, but the recent improvements in online indexing and search technology have made it increasingly difficult and, I suspect, ineffective.

My colleagues and I developed our sites in the web's early days, before information professionals had had a chance to assess it and refine it for their purposes by the development of progressively more effective search and retrieval technology. They have given us a great deal. But something has been lost, too.

I do not blame librarians and archivists for this loss. They are not being shortsighted. They are simply doing their jobs, as they are defined by the conventions of their profession. These conventions are  worthwhile and to be applauded. I suspect that attempts to devise an interface accommodating my preferred two-axis approach would likely compromise the efficacy of the available search and retrieval technology in some way. Were it possible to design such an interface without negative trade-offs, I suspect that it would require a considerable amount of financial resources and technical expertise, which are seldom available in the present political climate.

Our present web sites do make interpretive materials available for use, albeit not in the precise manner I originally envisioned.

As a historian, I have come to understand the versions of Lincoln/Net, Mark Twain's Mississippi, and other web sites that we developed with LAMP technology as artifacts, expressions of their time, especially the available technology. You can still see them on the Wayback Machine.




Thursday, May 26, 2016

Text Mining and Library Cataloging

During the spring semester of 2016 I supervised a team of students (Marcos Quezada, a graduate student in Operations Management and Information Systems; Fredrik Stark, a PhD candidate in NIU's English Department, and Mitchell Zaretsky, a junior Computer Science major) as they explored text mining in the context of Northern Illinois University Libraries' large online collection of late nineteenth and early twentieth century dime novels (http://dimenovels.lib.niu.edu). We worked in the format of an experiential learning activity, meaning that we addressed a problem brought to us by a client. In this case Matthew Short, NIU Libraries Metadata Librarian and Cataloger, served as the client.

In the experiential learning format, the client presents the student team with a set of goals. Mr. Short asked the team to develop a text classification application or tool to help library catalogers to determine the genre of the approximately 1,900 digitized text in the collection. In traditional cataloging activities, the cataloger inspects a work manually in order to derive basic information necessary to catalog it accurately. This can be a lengthy process. Perhaps text-mining technology could help catalogers to improve the speed and efficiency with which they cataloged a very large collection.

Mr. Short's goals also included the compilation of a list of genres and related subject terms for possible use in reclassifying online digitized collections; investigating text-mining tools for the future development of the prototype classifier application and future studies of the collections.

The team began work by using Weka, an open-source data and text-mining application. Mr. Short selected it because it enables users to acquaint themselves with the separate activities that make up text mining and construct original applications using blocks of existing Java code.

Mr. Short introduced the students to a typical text-mining work flow. He had been working to achieve his goals prior to engaging with this group, and for all intents and purposes led the team's activities. As the team's official coach, I attempted to facilitate discussion, scheduled activities, and completed paperwork.

The students began by gathering text files of digitized dime novels cataloged as belonging to the collection's better-represented genres. These genres included detective and mystery stories; western stories; sea stories; historical fiction; adventure stories; and bildungsromans (coming of age) stories.

The team next engaged in pre-processing activities in order to produce the most accurate text possible. NIU Libraries staff members originally produced the digital texts in the digital dime novel collection by the use of Optical Character Recognition software and did not attempt to correct any mistakes within them. Pre-processing began with the removal of stop words (such as the, an, and, etc.) and also included tokenization (identifying groups of characters as words) and stemming (reducing different inflections of a word to their root form) of words.  We also used Weka to render the text materials as a bag of words (i.e., set aside grammar and word order) and transform words into vectors, or numerical representations.

The team then moved on to text classification. They began by using a set of already-cataloged works to train Weka to identify specific words or sets of words with the individual genres mentioned above. Of the algorithms available in Weka, Naive Bayes proved most effective. They found that in 65% of works examined Weka's classification agreed with that of a human cataloger. Investigating this discrepancy, the team found that the use of additional filtering techniques, including the use of TF-IDF (a process to determine how important a word is to a document in a collection or corpus); a better stemmer (the open-source product Snowball); a list of nineteenth-century stop words composed by Matthew Jockers, a scholar of the period's literature; rendering all letters in lower-case; and setting the number of words in each text to be analyzed to 500 improved accuracy, i.e., Weka agreeing with a human cataloger's genre classification, to 75 %. They also discovered that a number of texts in the training set had been cataloged as belonging in two different genres. Removal of these works improved accuracy to 83%.

With the information above, Mitchell Zaretsky used Weka's Java API to construct an original classifier application. It reported the probability of a work fitting in one of the several genres. Working with a new test corpus of 214 digitized dime novels, the team found that their classifier agreed with human catalogers 71% of the time.

On the basis of this test, the team determined that their application can help catalogers to determine a dime novel's genre. It can also serve as an effective tool for evaluating the genre determinations of catalogers not using the application in their work. They also suggested that text-mining activities uncovered details about the form and content of works in NIU's digitized dime novel collection that invites further research.