1,175 research outputs found

    Text Document Classification: An Approach Based on Indexing

    Get PDF
    ABSTRACT In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of text documents. In addition, in order to avoid sequential matching during classification, we propose to index the terms in Btree, an efficient index scheme. Each term in B-tree is associated with a list of class labels of those documents which contain the term. Further the corresponding classification technique has been proposed. To corroborate the efficacy of the proposed representation and status matrix based classification, we have conducted extensive experiments on various datasets. Original Source URL : http://aircconline.com/ijdkp/V2N1/2112ijdkp04.pdf For more details : http://airccse.org/journal/ijdkp/vol2.htm

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

    Full text link
    Non-negative matrix factorization (NMF) has proved effective in many clustering and classification tasks. The classic ways to measure the errors between the original and the reconstructed matrix are l2l_2 distance or Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly handled when we use these error measures. As a consequence, alternative measures based on nonlinear kernels, such as correntropy, are proposed. However, the current correntropy-based NMF only targets on the low-level features without considering the intrinsic geometrical distribution of data. In this paper, we propose a new NMF algorithm that preserves local invariance by adding graph regularization into the process of max-correntropy-based matrix factorization. Meanwhile, each feature can learn corresponding kernel from the data. The experiment results of Caltech101 and Caltech256 show the benefits of such combination against other NMF algorithms for the unsupervised image clustering
    corecore