Article thumbnail

The Influence of Reference Corpus Size on Wordsmith Tools Keywords Extraction

By Tony Berber Sardinha


A KeyWords analysis (using WordSmith Tools) enables the discovery of lexical items which reveal the main lexical sets in a text or corpus. Such an analysis requires that a reference corpus be compared to the corpus the researcher intends to describe (the study corpus). This paper presents a mathematical method for finding out the influence of reference corpus size on the number of key words extracted by the program. The results reveal that a reference corpus that is at least five times as large as the study corpus allows for drawing an amount of key words that is statistically equivalent to larger reference corpora, thus suggesting five times (as larger as the study corpora) as the minimum order of magnitude for reference corpora

Topics: WordSmith Tools, KeyWords, Corpus Linguistics, reference corpus size., English language, PE1-3729, Language. Linguistic theory. Comparative grammar, P101-410
Publisher: Pontifícia Universidade Católica de São Paulo
Year: 2012
OAI identifier:
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.