257,635 research outputs found

    Robust hierarchical k-center clustering

    Get PDF
    One of the most popular and widely used methods for data clustering is hierarchical clustering. This clustering technique has proved useful to reveal interesting structure in the data in several applications ranging from computational biology to computer vision. Robustness is an important feature of a clustering technique if we require the clustering to be stable against small perturbations in the input data. In most applications, getting a clustering output that is robust against adversarial outliers or stochastic noise is a necessary condition for the applicability and effectiveness of the clustering technique. This is even more critical in hierarchical clustering where a small change at the bottom of the hierarchy may propagate all the way through to the top. Despite all the previous work [2, 3, 6, 8], our theoretical understanding of robust hierarchical clustering is still limited and several hierarchical clustering algorithms are not known to satisfy such robustness properties. In this paper, we study the limits of robust hierarchical k-center clustering by introducing the concept of universal hierarchical clustering and provide (almost) tight lower and upper bounds for the robust hierarchical k-center clustering problem with outliers and variants of the stochastic clustering problem. Most importantly we present a constant-factor approximation for optimal hierarchical k-center with at most z outliers using a universal set of at most O(z2) set of outliers and show that this result is tight. Moreover we show the necessity of using a universal set of outliers in order to compute an approximately optimal hierarchical k-center with a diffierent set of outliers for each k

    Methods of Hierarchical Clustering

    Get PDF
    We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

    Anytime Hierarchical Clustering

    Get PDF
    We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

    Belief Hierarchical Clustering

    Get PDF
    In the data mining field many clustering methods have been proposed, yet standard versions do not take into account uncertain databases. This paper deals with a new approach to cluster uncertain data by using a hierarchical clustering defined within the belief function framework. The main objective of the belief hierarchical clustering is to allow an object to belong to one or several clusters. To each belonging, a degree of belief is associated, and clusters are combined based on the pignistic properties. Experiments with real uncertain data show that our proposed method can be considered as a propitious tool

    Hierarchical growing cell structures: TreeGCS

    Get PDF
    We propose a hierarchical clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm refines and builds upon the GCS base, overcoming an inconsistency in the original GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. Our algorithm is unsupervised, flexible, and dynamic and we have imposed no additional parameters on the underlying GCS algorithm. Our ultimate aim is a hierarchical clustering neural network that is both consistent and stable and identifies the innate hierarchical structure present in vector-based data. We demonstrate improved stability of the GCS foundation and evaluate our algorithm against the hierarchy generated by an ascendant hierarchical clustering dendogram. Our approach emulates the hierarchical clustering of the dendogram. It demonstrates the importance of the parameter settings for GCS and how they affect the stability of the clustering

    Isotropic Dynamic Hierarchical Clustering

    Get PDF
    We face a need of discovering a pattern in locations of a great number of points in a high-dimensional space. Goal is to group the close points together. We are interested in a hierarchical structure, like a B-tree. B-Trees are hierarchical, balanced, and they can be constructed dynamically. B-Tree approach allows to determine the structure without any supervised learning or a priori knowlwdge. The space is Euclidean and isotropic. Unfortunately, there are no B-Tree implementations processing indices in a symmetrical and isotropical way. Some implementations are based on constructing compound asymmetrical indices from point coordinates; and the others split the nodes along the coordinate hyper-planes. We need to process tens of millions of points in a thousand-dimensional space. The application has to be scalable. Ideally, a cluster should be an ellipsoid, but it would require to store O(n2) ellipse axes. So, we are using multi-dimensional balls defined by the centers and radii. Calculation of statistical values like the mean and the average deviation, can be done in an incremental way. While adding a point to a tree, the statistical values for nodes recalculated in O(1) time. We support both, brute force O(2n) and greedy O(n2) split algorithms. Statistical and aggregated node information also allows to manipulate (to search, to delete) aggregated sets of closely located points. Hierarchical information retrieval. When searching, the user is provided with the highest appropriate nodes in the tree hierarchy, with the most important clusters emerging in the hierarchy automatically. Then, if interested, the user may navigate down the tree to more specific points. The system is implemented as a library of Java classes representing Points, Sets of points with aggregated statistical information, B-tree, and Nodes with a support of serialization and storage in a MySQL database.Comment: 6 pages with 3 example
    • …
    corecore