386 research outputs found
Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms
In agglomerative hierarchical clustering, pair-group methods suffer from a
problem of non-uniqueness when two or more distances between different clusters
coincide during the amalgamation process. The traditional approach for solving
this drawback has been to take any arbitrary criterion in order to break ties
between distances, which results in different hierarchical classifications
depending on the criterion followed. In this article we propose a
variable-group algorithm that consists in grouping more than two clusters at
the same time when ties occur. We give a tree representation for the results of
the algorithm, which we call a multidendrogram, as well as a generalization of
the Lance and Williams' formula which enables the implementation of the
algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using
Multidendrograms available at
http://deim.urv.cat/~sgomez/multidendrograms.ph
Analysis of Agglomerative Clustering
The diameter -clustering problem is the problem of partitioning a finite
subset of into subsets called clusters such that the maximum
diameter of the clusters is minimized. One early clustering algorithm that
computes a hierarchy of approximate solutions to this problem (for all values
of ) is the agglomerative clustering algorithm with the complete linkage
strategy. For decades, this algorithm has been widely used by practitioners.
However, it is not well studied theoretically. In this paper, we analyze the
agglomerative complete linkage clustering algorithm. Assuming that the
dimension is a constant, we show that for any the solution computed by
this algorithm is an -approximation to the diameter -clustering
problem. Our analysis does not only hold for the Euclidean distance but for any
metric that is based on a norm. Furthermore, we analyze the closely related
-center and discrete -center problem. For the corresponding agglomerative
algorithms, we deduce an approximation factor of as well.Comment: A preliminary version of this article appeared in Proceedings of the
28th International Symposium on Theoretical Aspects of Computer Science
(STACS '11), March 2011, pp. 308-319. This article also appeared in
Algorithmica. The final publication is available at
http://link.springer.com/article/10.1007/s00453-012-9717-
Spatio-Temporal Dynamics of Caddisflies in Streams of Southern Western Ghats
The dynamics of physico-chemical factors and their effects on caddisfly communities were examined in 29 streams of southern Western Ghats. Monthly samples were collected from the Thadaganachiamman stream of Sirumalai Hills, Tamil Nadu from May 2006 to April 2007. Southwest and northeast monsoons favored the existence of caddisfly population in streams. A total of 20 caddisfly taxa were collected from 29 streams of southern Western Ghats. Hydropsyche (Trichoptera: Hydropsychidae) were more widely distributed throughout sampling sites than were the other taxa. Canonical correspondence analysis showed that elevation was a major variable and pH, stream order, and stream substrates were minor variables affecting taxa richness. These results suggested that habitat heterogeneity and seasonal changes were stronger predictors of caddisfly assemblages than large-scale patterns in landscape diversity
How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling
Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon
- …