Considering TEI and oral history transcripts

This is Part 2 of an ongoing series of blog posts about transcription in oral history. Part 1 dealt with Transcription and oral history in general

Transcribing audio: benefits and obligations
While the primary source within oral history is the audio recording, there are still many valid reasons for transcribing oral history interviews. Not least among these is the issue of accessibility: a transcript provides a written account that makes the oral testimony available to people with hearing loss, as well as giving other users choice in how they access content and being good for search engine optimization.i

TEI markup of transcripts
For these reasons, it is desirable to associate a transcript with an audio recording, but there are also many ways that transcripts could be used as a means of exploring the content of oral history collections, particularly using established conventions of digital textual studies such as TEI (Text Encoding Initiative). This is an internationally recognised, interdisciplinary and standardised encoding system for marking up text. One of the text-types dealt with in the TEI P5 Guidelines is for the markup of transcriptions of speech and oral testimony.ii

There are three main reasons why TEI encoding is used to markup text (after Bauman):iii

  • to encourage digital preservation
  • to facilitate research using the text (context and content)
  • to promote sharing (to encourage reading and to promote further research with the text).

TEI has developed over more than twenty-five years (Burnard 2013) and its development has been closely linked to that of the discipline (or quasi-discipline) now known as digital humanities (see Kirschenbaum 2009). It has been used as a method of enhancing granularity in digital texts, allowing information and metadata to be added to the text.

Examples of oral history projects that have been encoded using TEI
TEI has been applied to oral history transcripts, with most examples where use of TEI is explicitly noted coming from projects from the United States. Examples include:

Justifications as to why this work has been carried out are not common. Nevertheless, the TEI P5 Guidelines provide the potential for great detail to be included in a transcription (for example, they allow the encoding of variations in rhythm, tempo, intonation, pitch, perceived loudness and voice quality) as well as different ways to describe pauses and non-lexical backchannels (vocal sounds such as “mm” that have no meaning, but nevertheless have an important function and role within a conversation).iv

Subjectivity in markup (and in transcription)
To assign many of the TEI elements and attributes requires the encoder to make a subjective judgement about the audio recording; these judgements will be about the tempo (whether it is considered slow or fast), the loudness (soft or loud), describing the range of the pitch and the quality of the voice, and so on. However, all transcription is a subjective process. Just as individuals marking up the same recording and transcription in TEI are unlikely to produce identical results, similarly more traditional transcripts are likely to vary depending on the transcriber. In this respect (regarding its subjectivity), a TEI markup of an oral history transcript is no different from an un-encoded transcript. Yet the TEI encoded document has the potential to include substantial amounts of additional information (depending on the extent of the markup that has been carried out) and this can then be used to extract specific information from large quantities of transcribed text (for example, XSL can be used to extract all the names of people mentioned in the interviews, or all the place names). Descriptions of intonation, slurring, rhythm and so on in speech recordings (all of which can be included in TEI markup) also facilitate the production of finely detailed descriptions of the recording within the transcript.

Potential benefits of using TEI and XSL when dealing with sensitive information
It is perhaps in the role as a tool for presentation that TEI can be most useful when dealing with oral history transcripts. This is because topics frequently arise during the course of interviews that are not suitable for publication afterwards (the intimate nature of an oral history interview creating and atmosphere unlike that of a publicly preformed speech). This can sometimes necessitate the removal or redaction of some passages from the transcription of an oral history interview, rendering it necessary to prepare more than one copy of a transcript: one suitable for the archive (including reproduction, where possible, of most of the material in the audio recording in a written format) and a second copy that is suitable for public dissemination or publication, where sensitive details can be hidden.

When archival transcripts are being marked up, redactions are encoded in TEI could be given the attribute “privacy”, for example. While it is possible to markup text for redaction using specially assigned redaction elements, using markup in order to output both archival and dissemination copy transcripts is not possible without the writing some XSL, e.g. using conditional processing (<xsl:if>) and testing negative conditions (not () ).v The benefits of using standardised systems such as TEI are therefore to be weighed against bespoke requirements.

Is TEI worth the cost (in time and labour)?
Embedding TEI markup is a time-consuming task, one that adds greatly to the already considerable amount of time and labour investment that is required in order to produce a plain text transcript of an Some humanists have begun to question TEI as a research tool, particularly since customisation is commonly required in order to meet the needs of individual researchers and research projects.vii However, within the contest of an oral history project, it is necessary (imperative) to attend to each interview transcript in great detail. The extent to which the process of marking up the transcription text is worthwhile is, however open to question since there is ongoing discussion about TEI as a tool for presentation (rather than analysis) which has emerged separately from techniques of analysing texts.viii

Update (18 November 2014): I’ve added a “Part 3” to this series about transcription in oral history (see Alternatives to transcription for the oral historian?)


i Lembree, D. (2011). 25 Ways To Make Your Website Accessible. Webhosting search. Make your website accessible. Retrieved February 27, 2014, from (see point 22 “Provide Transcriptions”).

ii TEI Consortium, eds. “8 Transcriptions of Speech.” TEI P5: Guidelines for Electronic Text Encoding and Interchange. [Version 2.6.0]. [20th January 2014]. TEI Consortium. ([4th March 2014]).

iii Bauman, S. (2011). Interchange vs. Interoperability. Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2–5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7. DOI:10.4242/BalisageVol7.Bauman01.

iv Lambertz, K. (2011). Back-channelling: The use of yeah and mm to portray engaged listenership, Griffith Working Papers in Pragmatics and Intercultural Communication 4, 11 – 18. Retrieved 28 Februaru 2014. From

v See Module 013: Conditional processing and string parsing

vi Boyd, D. (2013). OHMS: Enhancing Access to Oral History for Free. Oral History Review, 40(1), 95–106. DOI:10.1093/ohr/oht031.

vii Schmidt, D. (2010).The inadequacy of embedded markup for cultural heritage texts, Literary and Linguistic Computing, Vol. 25, No. 3. DOI:10.1093/llc/fqq007. See page 348.

viii Bauman, S., Hoover, D., van Dalen-Oskan, K., & Piez, W. (2012). Text Analysis Meets Text Encoding | Digital Humanities 2012. Retrieved from


One thought on “Considering TEI and oral history transcripts”

Comments are closed.