Search CORE

5 research outputs found

Generating, Visualizing and Evaluating High Quality Clusters for Information Organization

Author: Aslam Javed
Pelekhov Katya
Rus Daniela
Publication venue: Dartmouth Digital Commons
Publication date: 01/08/1997
Field of study

Dartmouth Digital Commons (Dartmouth College)

Computing Dense Clusters On-line for Information Organization

Author: Daniela Rus
Javed Aslam
Katya Pelekhov
Publication venue
Publication date: 01/10/1997
Field of study

We present and analyze the off-line star algorithm for clustering static information systems and the online star algorithm for clustering dynamic information systems. These algorithms partition a document collection into a number of clusters that is naturally induced by the collection. We show a lower bound on the accuracy of the clusters produced by these algorithms. We use the random graph model to show that both star algorithms produce correct clusters in time \Theta(V +E). Finally, we provide data from extensive experiments. 1 Introduction Modern information systems have vast amounts of unorganized data that changes dynamically. Consider, for example, the flow of information that arrives continuously on news wires, or is aggregated by a news organization such as CNN. Some stories are brand new. Other stories are follow-ups of previous stories. Yet another type of stories make previous reportings obsolete. The news focus changes regularly with this flow of information. In such dyn..

CiteSeerX

Dartmouth Digital Commons (Dartmouth College)

Generating, Visualizing, and Evaluating High-Quality Clusters for Information Organization

Author: Daniela Rus
Javed Aslam
Katya Pelekhov
Publication venue
Publication date
Field of study

We present and analyze the star clustering algorithm. We discuss an implementation of this algorithm that supports browsing and document retrieval through information organization. We define three parameters for evaluating a clustering algorithm to measure the topic separation and topic aggregation achieved by the algorithm. In the absence of benchmarks, we present a method for randomly generating clustering data. Data from our user study shows evidence that the star algorithm is effective for organizing information. 1 Introduction Modern information systems have vast amounts of unorganized data. Users often don't know what they need until they need it. In dynamic, time-pressured situations such as emergency relief for weather disasters, presenting the results of a query as a ranked list of hundreds of titles is ineffective. To cull the critical information out of a large set of potentially useful sources we need methods for organizing as accurately as possible the data and ways of v..

CiteSeerX

A Practical Clustering Algorithm for Static and Dynamic Information Organization

Author: Daniela Rus
Javed Aslam
Katya Pelekhov
Publication venue
Publication date
Field of study

We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally efficient cover by dense subgraphs. We further show a lower bound on the quality of the clusters produced by these algorithms as well as demonstrate that these algorithms are efficient (running times roughly linear in the size of the problem). Finally, we provide data from a number of experiments

CiteSeerX