Variation of word frequencies across genre classification tasks

Kim, Y.; Ross, S.

research

Variation of word frequencies across genre classification tasks

Authors: Y. Kim
S. Ross
Publication date: 1 January 2007
Publisher: GEIE-ERCIM

Abstract

This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Enlighten: Publications

oai:eprints.gla.ac.uk:33647

Last time updated on 09/04/2020

Enlighten

oai:eprints.gla.ac.uk:33647

Last time updated on 02/07/2012