Text Document Classification: An Approach Based on Indexing

B S Harish

Text Document Classification: An Approach Based on Indexing

Authors: B S Harish
Publication date
Publisher
Doi

Abstract

ABSTRACT In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of text documents. In addition, in order to avoid sequential matching during classification, we propose to index the terms in Btree, an efficient index scheme. Each term in B-tree is associated with a list of class labels of those documents which contain the term. Further the corresponding classification technique has been proposed. To corroborate the efficacy of the proposed representation and status matrix based classification, we have conducted extensive experiments on various datasets. Original Source URL : http://aircconline.com/ijdkp/V2N1/2112ijdkp04.pdf For more details : http://airccse.org/journal/ijdkp/vol2.htm

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

ZENODO

oai:zenodo.org:2575334

Last time updated on 09/07/2019