Strength in numbers? Some thoughts from a crowdsourced transcriber.

Transcribing handwritten documents is one of the most common types of projects in humanities crowdsourcing (Dunn and Hedges 2013, 160). In the case of the Letters of 1916 project the aim is not only to transcribe but also to collect material that relates to a turbulent year in Irish history. Members of the public are invited to transcribe letters from that year that have been digitised and uploaded to the site. They are also invited to contribute relevant materials from their own family archives. Letters of 1916 describes itself as “the first public humanities project in Ireland” (http://dh.tcd.ie/letters1916/) and it aims to bring the year of 1916 to life through the letters of the people of that time.

I recently transcribed two letters from the Letters of 1916 project as part of an exercise for a 5-credit module. The first letter was handwritten and, since the writing was not very tidy, it was often difficult to read. I began by trying to decipher it word-by-word, not necessarily a good idea. After a while I found that it was much easier to read after I had skimmed through the entire letter to establish context. This made me think about the advantages and disadvantages of technology as I tried to decipher the letter. The ability to zoom in closely on difficult words in a digital document is a useful tool, but the document was only fully comprehensible to me after I had “zoomed out” and had read the entire document on a very general level. This gave the context that helped later when I was trying to pin point the exact words and punctuation used. It may be that it would have be easier to decipher this letter if I had quickly scanned/read it as a page in the hand, rather than on the screen.

The second letter was an official, typed document. This was a relief! It was much easier (but not necessarily more interesting) to decipher and to mark up a document like this, clearly typed and written to a standard official formula.

Part of the process of transcribing the Letters of 1916 involves adding TEI XML markup to the transcribed letters, using a transcription tool where TEI tags are selected from a limited set of buttons on a toolbar that runs across the top of the transcription box.

The transcription toolbar used by the Letters of 1916 project is based upon JBTEI, a MediaWiki extension developed for the Transcribe Bentham project. (Transcribe Bentham is a manuscript transcription project, where crowdsourced volunteers transcribe the un-published manuscripts of the utilitarian philosopher Jeremy Bentham, see Causer et al. 2012, 120; Causer and Terras 2014, 61–63). JBTEI was designed to facilitate the addition of TEI markup to a document without making it necessary for volunteers to “learn the minutiae of markup” (Causer and Terras 2014, 63). The Letters of 1916 project has made small changes to the toolbar: instead of a button for a Heading (under “A” in Transcribe Bentham) there is a button for an Address (under “A” in Letters of 1916). Two additional buttons have also been added, “S” to add markup for <salute> and “D” to add markup for <date>.

Early feedback from participants in Transcribe Bentham suggested that the main difficulties for transcribers/crowdsourcing volunteers were reading and deciphering manuscripts. This was a factor that dissuaded people from continuing to contribute to the project.

“Indeed, over half of respondents found that deciphering Bentham’s hand took longer than encoding. Though text-encoding was an additional complication to the process, encouragingly few survey respondents found it prohibitively difficult.” (Causer and Wallace 2012, paragraph 61).

My experience of crowdsourcing transcription projects (Letters of 1916 and Transcribe Bentham) tallies with these findings, although I was already someone who was reasonably familiar with XML and TEI. What I found was that the technological aspects of the tasks did not test me as much as the puzzle of messy handwriting. When I participated in Transcribe Bentham, for example, I found his handwriting so difficult to decipher I decided it was pointless to continue contributing.

This idea of volunteer “drop out” is a fairly common theme in crowdsourcing literature. Not surprising, since falling levels of participation could lead a project to stall. The results of volunteer surveys have been used by Transcribe Bentham to make changes in its transcription desk (Causer and Terras 2014, 68) and Ridge (2014, 7) notes that these attempts to provide a quality user experience “is vital for creating interfaces that are both productive and engaging”.

Studies of volunteer motivation are relatively common in academic literature on humanities crowdsourcing. Dunn and Hedges (2014, 152 – 155) outlined that most volunteers participate for personal and altruistic reasons, but that the dominant factor associated with contribution was usually about interest in the subject area.

Other concerns in the academic literature about crowdsourcing tend to revolve around a limited set of themes:

  • Why public institutions should become involved (it’s all about engagement)
  • What some people who have built crowdsourcing projects have discovered along the way (pitfalls to look out for, guidelines for success)
  • Occasionally there is some discussion of the ethical issues involved, such as using free labour, thereby denying a job for a heritage professional – although these projects are unlikely to attract funding unless they include a crowdsourcing element (Causer et al. 2012, 131). Dunn and Hedges (2014, 165) also note that the expense of developing a crowdsourcing project is so large that the labour costs saved have only a narrow cost benefit (this point explodes the idea that crowdsourcing is about saving money but does not address the use of free/voluntary labour).
  • There is also some discussion of the correct application of the term “crowdsourcing” since many projects find that they rely heavily on a small group of “super-contributors”, rather than a crowd. Transcribe Bentham’s heavy reliance on ten or so star volunteers has led the researchers involved to suggest that it should be called a “crowd-sifting” project (Causer and Terras 2014, 73–74).
  • Ridge (2014, 7) also notes “community management” as a key issue, listing items such as moderating content, managing communication and providing volunteers with regular updates on progress.

Overwhelmingly, then, the literature on crowdsourcing is about the management of projects and research and, while it may frequently focus on the idea of the crowd, the output of scholarly research into crowdsourcing is perhaps a more accurate reflection of the sourcing part of the term (i.e. the focus is not so much on the people involved in the work, but rather on maintaining a group of people who will continue to do the work, and therefore sourcing a continuous labour flow). Dunn and Hedges (2014, 150) point out that humanities crowdsourcing has developed along the lines of citizen science, within hierarchically structured projects (as opposed to community science, with a focus instead on peer production). The open dissemination of outputs notwithstanding, the structure of these projects is essentially hierarchical since it is academics who decide what is to be transcribed. Letters of 1916 makes some attempt at a less hierarchical decision-making process, in terms of content, by inviting the public to contribute their own letters. Of necessity, however, this content is still moderated and it is difficult to see how these projects could be structured in a non-hierarchical fashion and still retain a valid academic/scholarly output.

This, however, is where contact with the crowd can change the outlook of a research project. “Contact with participant communities…seems to change a project in more fundamental ways, including the development of new research questions” (Ridge 2014, 7). In addition to this, some long-term, well-established crowd sourcing projects have had time and volunteers enough “to demonstrate ways in which project participants can develop new skills and knowledge as a result of their growing interest in the project source material, or can graduate to more complex tasks or bigger responsibilities” (Ridge 2013, 443). An outcome that both opens new avenues of research for scholars and allows participants to truly develop new skills is a real demonstration of strength in numbers.*

*“Ní neart go cur le chéile” (translated as “no strength without unity”, or “strength in numbers”) is an Irish seanfhocal (proverb) used to advertise a new crowdsourced transcription project run by the National Folklore Collection. (See the poster here.)

 

References

Causer, T., Tonra, J., & Wallace, V. (2012). Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham*. Literary and Linguistic Computing, 27(2), 119–137. http://doi.org/10.1093/llc/fqs004

Causer, T., & Wallace, V. (2012). Building A Volunteer Community: Results and Findings from Transcribe Bentham. Digital Humanities Quarterly, 6(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html

Causer, T. & Terras, M. (2014) “Many hands make light work. Many hands together make merry work”: Transcribe Bentham and crowdsourcing manuscript collections, pp. 57-88 in Ridge, M. (ed.) Crowdsourcing our Cultural Heritage. Ashgate: Farnham.

Dunn, S., & Hedges, M. (2014). Crowd-sourcing as a Component of Humanities Research Infrastructures. International Journal of Humanities and Arts Computing, 7(1-2), 147–169.

Ridge, M. (2013). From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing. Curator: The Museum Journal, 56(4), 435–450. http://doi.org/10.1111/cura.12046

Ridge, M. (2014). Crowdsourcing our Cultural Heritage. Ashgate: Farnham.