text

The Coll Corpus : Towards a corpus of web-based college student newspapers

Abstract

Unlike major English-language corpora hitherto released, on-line college student newspapers provide an unexplored record from much younger writers. In these newspapers, 20-year-olds address their peers in a situation that largely parallels standard newspaper writing as regards formal correctness and time pressure. Nearly unconstrained by outside intervention or house style sheets, they deal with a range of university student interests, including creative writing.This preliminary version of the Coll Corpus consists of one issue each of nearly all 300-plus college and university newspapers available on the Web as of spring 1999, with a total of 3.88 million words. Although AmE dominates, the resultant geographical distribution is relatively well matched to actual population ratios. In its present form, the corpus already allows exploration of numerous lexical and semantic features along temporal and geographic dimensions. Given its on-line accessibility, future versions should be easily expandable by several orders of magnitude.</p

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 03/01/2025