10,267 research outputs found

    Word length distributions in modern Welsh prose texts.

    Get PDF
    This paper examines the distribution of word lengths in 12 prose texts written in modern Welsh (a P-Celtic language). The texts belong to the genres of new articles and Bible translation. For all texts, the observed frequencies can best be fitted by the 1-displaced Singh-Poisson distribution. This differs from published results on a Q-Celtic language (Scottish Gaelic) and suggests a P-celtic/Q-Celtic difference in word-length distribution. Further work is required to investigate other genres of Welsh as well as the other P- and Q-celtic languages

    Variation of word frequencies across genre classification tasks

    Get PDF
    This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments
    • …
    corecore