1,175 research outputs found
Text Document Classification: An Approach Based on Indexing
ABSTRACT
In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of text documents. In addition, in order to avoid sequential matching during classification, we propose to index the terms in Btree, an efficient index scheme. Each term in B-tree is associated with a list of class labels of those documents which contain the term. Further the corresponding classification technique has been proposed. To corroborate the efficacy of the proposed representation and status matrix based classification, we have conducted extensive experiments on various datasets.
Original Source URL : http://aircconline.com/ijdkp/V2N1/2112ijdkp04.pdf
For more details : http://airccse.org/journal/ijdkp/vol2.htm
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy
Non-negative matrix factorization (NMF) has proved effective in many
clustering and classification tasks. The classic ways to measure the errors
between the original and the reconstructed matrix are distance or
Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly
handled when we use these error measures. As a consequence, alternative
measures based on nonlinear kernels, such as correntropy, are proposed.
However, the current correntropy-based NMF only targets on the low-level
features without considering the intrinsic geometrical distribution of data. In
this paper, we propose a new NMF algorithm that preserves local invariance by
adding graph regularization into the process of max-correntropy-based matrix
factorization. Meanwhile, each feature can learn corresponding kernel from the
data. The experiment results of Caltech101 and Caltech256 show the benefits of
such combination against other NMF algorithms for the unsupervised image
clustering
- …