Search CORE

2 research outputs found

Text categorisation using document profiling

Author: Bernhard Pfahringer
Maximilien Sauban
Publication venue: Springer
Publication date: 01/01/2003
Field of study

Abstract. This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee’s model as a pre-processing filter to generate a dense representation for a given text document (a document profile) and passes that on to an arbitrary standard propositional learning algorithm. Similarly to standard feature selection for text classification, the dimensionality of instances is drastically reduced this way, which in turn greatly lowers the computational load for the subsequent learning algorithm. The filter itself is very fast as well, as it basically is just an interesting variant of Naive Bayes. We present different variations of the filter and conduct an evaluation against the Reuters-21578 collection that shows performances comparable to previously published results on that collection, but at a lower computational cost.

CiteSeerX

Research Commons@Waikato

Text Categorisation Using Document Profiling

Author: Bernhard Pfahringer
Maximilien Sauban
Publication venue
Publication date
Field of study

This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text categorisation. Our approach utilises Lee's model as a pre-processing filter to generate a dense representation for a given text document (a document profile) and passes that on to an arbitrary standard propositional learning algorithm. Similarly to standard feature selection for text classification, the dimensionality of instances is drastically reduced this way, which in turn greatly lowers the computational load for the subsequent learning algorithm. The filter itself is very fast as well, as it basically is just an interesting variant of Naive Bayes. We present different variations of the filter and conduct an evaluation against the Reuters-21578 collection that shows performances comparable to previously published results on that collection, but at a lower computational cost

CiteSeerX