Search CORE

22 research outputs found

Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

Author: BENTLEY J.L.
BUCHBERGER B.
David Eppstein
DURAN B. S.
GOTOH O.
MATIAS Y.
SUPOWIT K.J.
YIANILOS P.N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

arXiv.org e-Print Archive

CiteSeerX

Crossref

Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

Author: Alstrup S.
Aluru S.
Bern M. W.
David Eppstein
Erickson J.
Saitou N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2008
Field of study

We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

arXiv.org e-Print Archive

Crossref

Unnecessary Image Pair Detection for a Large Scale Reconstruction

Author: D. Lowe
J. Wang
M.A. Fischler
N. Snavely
R. Hartley
S. Se
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Fast approximate hierarchical clustering using similarity heuristics

Author: A Saeed
AJ Saldanha
AK Jain
C Böhm
D Eppstein
J Herrero
J Vilo
Jaak Vilo
JC Gower
L Kaufmann
M Ashburner
M Lukk
MB Eisen
Meelis Kull
MJL de Hoon
P Erdös
P Legendre
P Zezula
Q Zhang
R Shyamsundar
S Datta
T Cormen
Z Du
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Cluster-based network proximities for arbitrary nodal subsets

Author: Barr Peter S
Berenhaut Kenneth S
Kogel Alyssa M
Melvin Ryan L
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2018
Field of study

The concept of a cluster or community in a network context has been of considerable interest in a variety of settings in recent years. In this paper, employing random walks and geodesic distance, we introduce a unified measure of cluster-based proximity between nodes, relative to a given subset of interest. The inherent simplicity and informativeness of the approach could make it of value to researchers in a variety of scientific fields. Applicability is demonstrated via application to clustering for a number of existent data sets (including multipartite networks). We view community detection (i.e. when the full set of network nodes is considered) as simply the limiting instance of clustering (for arbitrary subsets). This perspective should add to the dialogue on what constitutes a cluster or community within a network. In regards to health-relevant attributes in social networks, identification of clusters of individuals with similar attributes can support targeting of collective interventions. The method performs well in comparisons with other approaches, based on comparative measures such as NMI and ARI

Directory of Open Access Journals

The Research Repository @ WVU (West Virginia University)

Optimal implementations of UPGMA and other common clustering algorithms

Author: Akella
Barthelemy
Benzécri
Day
Du
Elias
Eppstein
Gronau
Have
Ilan Gronau
Juan
Křivánek
Murtagh
Murtagh
Olson
Saitou
Shlomo Moran
Sibson
Sneath
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref