4 research outputs found

    Identifying Content and Function Words in Non-Annotated Corpora

    No full text
    In every corpus of natural language texts there are some tendencies which occur due to common properties of language, as for example, the principle of least effort. One of those phenomema is a typical distribution of frequency classes: a relatively small number of word types covers the bulk of text, while on the other hand a huge part of the vocabulary occurs only one time. The latter types are called singletons or hapax legomena
    corecore