Search CORE

386 research outputs found

Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

Author: A.D. GORDON
Alberto Fernández
B.J.T. MORGAN
G. HART
G.J. SZÉKELY
G.N. LANCE
J. MACCUISH
J.H. WARD Jr.
P.H.A. SNEATH
R.M. CORMACK
Sergio Gómez
T. BACKELJAU
V. ARNAU
W.A. KLOOT VAN DER
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2009
Field of study

In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using Multidendrograms available at http://deim.urv.cat/~sgomez/multidendrograms.ph

arXiv.org e-Print Archive

Crossref

Research Papers in Economics

Analysis of Agglomerative Clustering

Author: A.Z. Broder
Christian Sohler
Daniel Kuntze
F. Pereira
Johannes Blömer
K. Florek
K. Lee
L.L. McQuitty
M. Bādoiu
M. Charikar
M. Fréchet
M. Naszódi
M.B. Eisen
Marcel R. Ackermann
P.H.A. Sneath
R. Webster
S. Dasgupta
T. Feder
T.F. Gonzalez
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/03/2014
Field of study

The diameter

k

-clustering problem is the problem of partitioning a finite subset of

\mathbb{R}^d

into

k

subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of

k

) is the agglomerative clustering algorithm with the complete linkage strategy. For decades, this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension

d

is a constant, we show that for any

k

the solution computed by this algorithm is an

O(\log k)

-approximation to the diameter

k

-clustering problem. Our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm. Furthermore, we analyze the closely related

k

-center and discrete

k

-center problem. For the corresponding agglomerative algorithms, we deduce an approximation factor of

O(\log k)

as well.Comment: A preliminary version of this article appeared in Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science (STACS '11), March 2011, pp. 308-319. This article also appeared in Algorithmica. The final publication is available at http://link.springer.com/article/10.1007/s00453-012-9717-

arXiv.org e-Print Archive

Crossref

Spatio-Temporal Dynamics of Caddisflies in Streams of Southern Western Ghats

Author: Anbalagan S.
De Moor F.C.
Dinakaran S.
Dudgeon D.
Julka J.M
Legendre P.
Leuven R.S.E.W.
Magurran A.E.
Merritt R.W.
Minshall G.W.
Ross H.H.
S. Anbalagan
S. Dinakaran
Sivaramakrishnan K. G.
Sneath P.H.A.
Subramanian K.A.
Subramanian K.A.
Waringer J.
Publication venue: University of Wisconsin Library
Publication date
Field of study

The dynamics of physico-chemical factors and their effects on caddisfly communities were examined in 29 streams of southern Western Ghats. Monthly samples were collected from the Thadaganachiamman stream of Sirumalai Hills, Tamil Nadu from May 2006 to April 2007. Southwest and northeast monsoons favored the existence of caddisfly population in streams. A total of 20 caddisfly taxa were collected from 29 streams of southern Western Ghats. Hydropsyche (Trichoptera: Hydropsychidae) were more widely distributed throughout sampling sites than were the other taxa. Canonical correspondence analysis showed that elevation was a major variable and pH, stream order, and stream substrates were minor variables affecting taxa richness. These results suggested that habitat heterogeneity and seasonal changes were stronger predictors of caddisfly assemblages than large-scale patterns in landscape diversity

Crossref

PubMed Central

How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

Author: Hitchcock E.
Darwin C.
Edwards A.W.F.
Sneath P.H.A.
Saitou N.
Salemi M.
Lespinats S.
Jolliffe I.
Kuhner M.K.
Zaretsky K.
Cavalli-Sforza L.L.
Matsuda H.
Swofford D.L.
Li J.
Press W.H.
Glover F.
Goldberg D.E.
Reeves C.R.
Dowsland K.A.
Chalmers M.
Gromov M.
Milman V.D.
Bulmer M.
Demartines P.
Fleiss J.L.
Publication venue: Libertas Academica
Publication date: 01/01/2011
Field of study

Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

Warwick Research Archives Portal Repository

Online Research Database In Technology