A KeyWords analysis (using WordSmith Tools) enables the discovery of lexical items which reveal the main lexical sets in a text or corpus. Such an analysis requires that a reference corpus be compared to the corpus the researcher intends to describe (the study corpus). This paper presents a mathematical method for finding out the influence of reference corpus size on the number of key words extracted by the program. The results reveal that a reference corpus that is at least five times as large as the study corpus allows for drawing an amount of key words that is statistically equivalent to larger reference corpora, thus suggesting five times (as larger as the study corpora) as the minimum order of magnitude for reference corpora
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.