257,635 research outputs found
Robust hierarchical k-center clustering
One of the most popular and widely used methods for data clustering is hierarchical clustering. This clustering technique has proved useful to reveal interesting structure in the data in several applications ranging from computational biology to computer vision. Robustness is an important feature of a clustering technique if we require the clustering to be stable against small perturbations in the input data. In most applications, getting a clustering output that is robust against adversarial outliers or stochastic noise is a necessary condition for the applicability and effectiveness of the clustering technique. This is even more critical in hierarchical clustering where a small change at the bottom of the hierarchy may propagate all the way through to the top. Despite all the previous work [2, 3, 6, 8], our theoretical understanding of robust hierarchical clustering is still limited and several hierarchical clustering algorithms are not known to satisfy such robustness properties. In this paper, we study the limits of robust hierarchical k-center clustering by introducing the concept of universal hierarchical clustering and provide (almost) tight lower and upper bounds for the robust hierarchical k-center clustering problem with outliers and variants of the stochastic clustering problem. Most importantly we present a constant-factor approximation for optimal hierarchical k-center with at most z outliers using a universal set of at most O(z2) set of outliers and show that this result is tight. Moreover we show the necessity of using a universal set of outliers in order to compute an approximately optimal hierarchical k-center with a diffierent set of outliers for each k
Methods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss
efficient implementations that are available in R and other software
environments. We look at hierarchical self-organizing maps, and mixture models.
We review grid-based clustering, focusing on hierarchical density-based
approaches. Finally we describe a recently developed very efficient (linear
time) hierarchical clustering algorithm, which can also be viewed as a
hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Belief Hierarchical Clustering
In the data mining field many clustering methods have been proposed, yet
standard versions do not take into account uncertain databases. This paper
deals with a new approach to cluster uncertain data by using a hierarchical
clustering defined within the belief function framework. The main objective of
the belief hierarchical clustering is to allow an object to belong to one or
several clusters. To each belonging, a degree of belief is associated, and
clusters are combined based on the pignistic properties. Experiments with real
uncertain data show that our proposed method can be considered as a propitious
tool
Hierarchical growing cell structures: TreeGCS
We propose a hierarchical clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm refines and builds upon the GCS base, overcoming an inconsistency in the original GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. Our algorithm is unsupervised, flexible, and dynamic and we have imposed no additional parameters on the underlying GCS algorithm. Our ultimate aim is a hierarchical clustering neural network that is both consistent and stable and identifies the innate hierarchical structure present in vector-based data. We demonstrate improved stability of the GCS foundation and evaluate our algorithm against the hierarchy generated by an ascendant hierarchical clustering dendogram. Our approach emulates the hierarchical clustering of the dendogram. It demonstrates the importance of the parameter settings for GCS and how they affect the stability of the clustering
Isotropic Dynamic Hierarchical Clustering
We face a need of discovering a pattern in locations of a great number of
points in a high-dimensional space. Goal is to group the close points together.
We are interested in a hierarchical structure, like a B-tree. B-Trees are
hierarchical, balanced, and they can be constructed dynamically. B-Tree
approach allows to determine the structure without any supervised learning or a
priori knowlwdge. The space is Euclidean and isotropic. Unfortunately, there
are no B-Tree implementations processing indices in a symmetrical and
isotropical way. Some implementations are based on constructing compound
asymmetrical indices from point coordinates; and the others split the nodes
along the coordinate hyper-planes. We need to process tens of millions of
points in a thousand-dimensional space. The application has to be scalable.
Ideally, a cluster should be an ellipsoid, but it would require to store O(n2)
ellipse axes. So, we are using multi-dimensional balls defined by the centers
and radii. Calculation of statistical values like the mean and the average
deviation, can be done in an incremental way. While adding a point to a tree,
the statistical values for nodes recalculated in O(1) time. We support both,
brute force O(2n) and greedy O(n2) split algorithms. Statistical and aggregated
node information also allows to manipulate (to search, to delete) aggregated
sets of closely located points. Hierarchical information retrieval. When
searching, the user is provided with the highest appropriate nodes in the tree
hierarchy, with the most important clusters emerging in the hierarchy
automatically. Then, if interested, the user may navigate down the tree to more
specific points. The system is implemented as a library of Java classes
representing Points, Sets of points with aggregated statistical information,
B-tree, and Nodes with a support of serialization and storage in a MySQL
database.Comment: 6 pages with 3 example
- …