5 research outputs found

    Generating, Visualizing and Evaluating High Quality Clusters for Information Organization

    Get PDF
    We present and analyze the star clustering algorithm. We discuss an implementation of this algorithm that supports browsing and document retrieval through information organization. We define three parameters for evaluating a clustering algorithm to measure the topic separation and topic aggregation achieved by the algorithm. In the absence of benchmarks, we present a method for randomly generating clustering data. Data from our user study shows evidence that the star algorithm is effective for organizing information

    Computing Dense Clusters On-line for Information Organization

    Get PDF
    We present and analyze the off-line star algorithm for clustering static information systems and the online star algorithm for clustering dynamic information systems. These algorithms partition a document collection into a number of clusters that is naturally induced by the collection. We show a lower bound on the accuracy of the clusters produced by these algorithms. We use the random graph model to show that both star algorithms produce correct clusters in time \Theta(V +E). Finally, we provide data from extensive experiments. 1 Introduction Modern information systems have vast amounts of unorganized data that changes dynamically. Consider, for example, the flow of information that arrives continuously on news wires, or is aggregated by a news organization such as CNN. Some stories are brand new. Other stories are follow-ups of previous stories. Yet another type of stories make previous reportings obsolete. The news focus changes regularly with this flow of information. In such dyn..

    Generating, Visualizing, and Evaluating High-Quality Clusters for Information Organization

    No full text
    We present and analyze the star clustering algorithm. We discuss an implementation of this algorithm that supports browsing and document retrieval through information organization. We define three parameters for evaluating a clustering algorithm to measure the topic separation and topic aggregation achieved by the algorithm. In the absence of benchmarks, we present a method for randomly generating clustering data. Data from our user study shows evidence that the star algorithm is effective for organizing information. 1 Introduction Modern information systems have vast amounts of unorganized data. Users often don't know what they need until they need it. In dynamic, time-pressured situations such as emergency relief for weather disasters, presenting the results of a query as a ranked list of hundreds of titles is ineffective. To cull the critical information out of a large set of potentially useful sources we need methods for organizing as accurately as possible the data and ways of v..

    A Practical Clustering Algorithm for Static and Dynamic Information Organization

    No full text
    We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally efficient cover by dense subgraphs. We further show a lower bound on the quality of the clusters produced by these algorithms as well as demonstrate that these algorithms are efficient (running times roughly linear in the size of the problem). Finally, we provide data from a number of experiments
    corecore