Collections of Distinction

Anti-Slavery Manuscripts: How We’re Dividing the Data

by tblake

A note from IMLS Postdoctoral Fellow, Samantha Blickhan on the logic behind how we decided to make a fairly large collection of manuscripts more manageable in our transcription project:

Crowdsourcing can be an effective tool for classifying really large datasets, and Anti-Slavery Manuscripts (ASM) is no exception. The dataset is made up of about 12,000 letters written during the 19th century. They are included in this collection because they were either written by members of the Abolitionist movement, or their contents are related to the anti-slavery cause.

When starting a huge undertaking like this, it’s often helpful to sort the data before uploading it into a project. How we sort the data is dependent on how much information we already have about the collection. In the case of ASM, we’re very lucky, because there is already robust metadata for each subject, including the author, recipient, and date a letter was written (you can see the metadata for a letter by clicking on the Subject Info button). We chose to sort by date, and our dataset is broken down as follows:

1800 – 1839: 2,174 letters
1840 – 1849: 2,621 letters
1850 – 1859: 1,881 letters
1860 – 1869: 1,953 letters
1870 – 1900: 1,099 letters

We launched the project using only one group of data: letters written between 1800-1839. We wanted to start with a single group so that, in the event that there were problems with the site, any errors in transcription caused by bugs in the user interface would be contained to a single group of letters. Rolling out a dataset group by group can also help to re-energize a project and boost participation – we’ve learned from previous projects like AnnoTate and Shakespeare’s World that transcription projects frequently last for years, so it’s helpful to break down a dataset into manageable chunks. This allows us to let our transcribers know when a group is finished, so long-term participants can see the direct effects of their hard work. For example, as of writing this post, 490 letters have already been retired from the project, less than a month after launching.

Out of the groups above, we’ve also pulled out letters from specific individuals:

Charles Sumner Letters: 12 letters
William Lloyd Garrison Letters: 1,351 letters
Ziba B. Oakes Papers: 651 letters

The 3 datasets above were pulled out for specific reasons. In the case of Sumner and Garrison, we’ve pulled them out because the text of the letters have already been published in Memoir and Letters of Charles Sumner (Edward L. Pierce, 4 v., Boston: Roberts Brothers, 1893) and The Letters of William Lloyd Garrison (ed. By Walter M. Merrill and Louis Ruchames, 6 v., Harvard University Press, 1971-81). Once the non-published datasets have been completed, we’ll upload the Sumner and Garrison letters and give transcribers the option of working on them, with the knowledge that they’re already published. However, this transcription effort will mean that the contents of the letters will be available for free.

The Oakes papers were pulled out because they have been partially published in Broke by the War: Letters of a Slave Trader, edited by Edmund L. Drago (University of South Carolina Press, 1991), but also because they contain a different type of subject matter than the rest of the collection. Ziba B. Oakes was a slave broker who lived in Charleston, South Carolina. The papers contain his correspondence, and are directly related to the buying and selling of humans. As with the Sumner and Garrison letters, we want to offer volunteers the option of transcribing with the caveat that some of the material has been published, but we also want to acknowledge that many people may find the content of these letters deeply disturbing and upsetting. We decided that it would be best to give transcribers the choice to engage with this material, rather than lumping it in with the rest of the dataset.

We’ll keep adding more data as the project moves along. We’ll also email our volunteers every time we add more data to the project (& will also post about it on Talk). So if you’re particularly interested in a certain time period, be on the lookout!