Compiling and using a parallel corpus for research in translation

Abstract

There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. The advent of corpus linguistics, however, has made it possible to analyse enormous quantities of translated text in unprecedented ways. In line with these advances, parallel corpora can provide access to many aspects of translation that had previously not been possible to study in a systematic way. The first part of this paper discusses different types of decisions that have to be made when building a parallel corpus, with particular emphasis to compilation questions that are unique to parallel corpora as opposed to corpora in general. This is followed by an account of the choices made when creating COMPARA - a post-edited, bi-directional parallel corpus of English and Portuguese literary texts with 3 million words, freely available for research and education at http://www.linguateca.pt/COMPARA/. Finally, examples of how this parallel corpus can be (and has been) used in translation research are presented

    Similar works