4 research outputs found

    Experiments on Tunable Indexing

    Get PDF
    The effectiveness and efficiency of an Information Retrieval (IR) system depends on the quality of its indexing system. Indexing con be used in inverted file systemsor in cluster-based retrieval. In this article, new concept called tunable indexing is introduced. With tunable indexing the number of clusters of a document clustering system can be varied to any desired value. Also covered are the computation of Term Discriminarion Value(TDV) with the cover coefficienr (CC) concepts and its use in tunable indexing. A set of experiments has slown the consistency between the CC based TDYs and the TDYs determined with the known methods. The main use of turnable indexing has been observed in determining the parameters of a clustering system

    Experiments on Incremental Clustering

    Get PDF
    Clustering of very large document databases is essential to reduce the spacehime complexity of information retrieval. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering at discrete times is introduced, Its complexity and cost analysis and an investigation of the expected behavior of the algorithm are provided. Through empirical testing, it is shown that the algorithm is achieving its purpose in terms of being cost effective, generating statistically valid clusters that are compatible with those of reclustering, and providing effective information retrieval

    Concepts and Effectiveness of the Cover Coefficient Based Clustering Methodology for Text Databases

    Get PDF
    An algorithm for document clustering is introduced. The base concept of the algorithm, Cover Coefficient (CC) concept, provides means of estimating the number of clusters within a document database. The CC concept is used also to identify the cluster seeds, to form clusters with the seeds, and to calculate Term Discrimination and Document Significance values (TDV, DSV). TDVs and DSVs are used to optimize document descriptions. The CC concept also relates indexing and clustering analytically. Experimental results indicate that the clustering performance in terms of the percentage of useful information accessed (precision) is forty percent higher, with accompanying reduction in search space, than that of random assignment of documents to clusters. The experiments have validated the indexing-clustering relationships and shown improvements in retrieval precision when TDV and DSV optimizations are used

    An Update Algorithm for Restricted Random Walk Clusters

    Get PDF
    This book presents the dynamic extension of the Restricted Random Walk Cluster Algorithm by Schöll and Schöll-Paschinger. The dynamic variant allows to quickly integrate changes in the underlying object set or the similarity matrix into the clusters; the results are indistinguishable from the renewed execution of the original algorithm on the updated data set
    corecore