research

Annotating a corpus of Early Modern English writing for categories of discourse presentation

Abstract

This article discusses the process of annotating a small corpus of Early Modern English writing that we have constructed in order to investigate the diachronic development of speech, writing and thought presentation. The work we have done so far is a pilot investigation for a planned larger project. We have constructed a corpus of approximately 40,000 words of Early Modern English (EModE) fiction and news journalism and annotated it for categories of discourse presentation (DP) drawn from a model originally proposed by Leech and Short (1981). This has allowed us to quantify the types of discourse presentation within the corpus and to compare our findings against those from a similarly annotated corpus of Present Day English (PDE) writing (reported in Semino and Short 2004). Our results so far appear to indicate developing stylistic tendencies in fiction and news texts in the Early Modern period, and suggest that it would be profitable to extend the project through the construction of a larger corpus incorporating a greater number of text-types in order to test our hypotheses more rigorously. In this article we concentrate specifically on describing the annotation phase of the project. We discuss the criteria by which we defined the various discourse presentation categories in order to make clear our analytical methodology, as well as the issues we were confronted with in 2 trying to annotate in a systematic and retrievable way. We conclude with some preliminary results to illustrate the value of this kind of annotation and suggest some hypotheses resulting from this pilot investigation

    Similar works