Concepts and Effectiveness of the Cover Coefficient Based Clustering Methodology for Text Databases

Can, Fazli; Ozkarahan, Esen

Concepts and Effectiveness of the Cover Coefficient Based Clustering Methodology for Text Databases

Authors: Fazli Can
Esen Ozkarahan
Publication date: 1 December 1987
Publisher

Abstract

An algorithm for document clustering is introduced. The base concept of the algorithm, Cover Coefficient (CC) concept, provides means of estimating the number of clusters within a document database. The CC concept is used also to identify the cluster seeds, to form clusters with the seeds, and to calculate Term Discrimination and Document Significance values (TDV, DSV). TDVs and DSVs are used to optimize document descriptions. The CC concept also relates indexing and clustering analytically. Experimental results indicate that the clustering performance in terms of the percentage of useful information accessed (precision) is forty percent higher, with accompanying reduction in search space, than that of random assignment of documents to clusters. The experiments have validated the indexing-clustering relationships and shown improvements in retrieval precision when TDV and DSV optimizations are used

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Scholarly Commons @ MiamiOH (Miami University)

oai:sc.lib.miamioh.edu:2374.MI...

Last time updated on 30/10/2019

Miami University

oai:sc.lib.miamioh.edu:2374.MI...

Last time updated on 02/02/2021