The Reference Corpus of Contemporary Portuguese and related resources

Abstract

The extraordinary growth of computer applications, particularly over the last two decades, has enabled the easy compilation and exploration of large corpora and lexica. These linguistic resources play a fundamental role in the areas of theoretical linguistics and natural language engineering. Combining these two areas of knowledge can, in fact, result in the development of a large number of applications, such as new and straightforward descriptions of languages based on real data; contrastive studies between varieties of a particular language aiming at finding factors of unity and diversity; cross-linguistic contrastive studies; grammars; lexica and dictionaries; terminologies; assisted translation materials; language teaching materials; computer tools and applications for processing natural language. Having this principle in mind and following the tradition at the Centre of Linguistics of the University of Lisbon (CLUL)i of collecting and studying real language data, a large electronic corpus – the Corpus de Referência do Português Contemporâneo (Reference Corpus of Contemporary Portuguese, CRPC) – is being compiled at CLUL since 1988. The CRPC currently contains approximately 310 million words, searchable through a user-friendly interface, and it is envisaged as a monitor corpus (from which one can extract balanced subcorpora) that can serve as a sample of the Portuguese language (both in its written and spoken varieties). In the next sections, we will describe the CRPC and how it forms the basis for important resources developed at CLUL.info:eu-repo/semantics/publishedVersio

    Similar works