6,517 research outputs found
Fast multi-image matching via density-based clustering
We consider the problem of finding consistent matches
across multiple images. Previous state-of-the-art solutions
use constraints on cycles of matches together with convex
optimization, leading to computationally intensive iterative
algorithms. In this paper, we propose a clustering-based
formulation. We first rigorously show its equivalence with
the previous one, and then propose QuickMatch, a novel
algorithm that identifies multi-image matches from a density
function in feature space. We use the density to order the
points in a tree, and then extract the matches by breaking this
tree using feature distances and measures of distinctiveness.
Our algorithm outperforms previous state-of-the-art methods
(such as MatchALS) in accuracy, and it is significantly faster
(up to 62 times faster on some bechmarks), and can scale to
large datasets (with more than twenty thousands features).Accepted manuscriptSupporting documentatio
Efficient Computation of Multiple Density-Based Clustering Hierarchies
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method,
produces a hierarchical organization of clusters in a dataset w.r.t. a
parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the
sense that a small change in mpts typically leads to only a small or no change
in the clustering structure, choosing a "good" mpts value can be challenging:
depending on the data distribution, a high or low value for mpts may be more
appropriate, and certain data clusters may reveal themselves at different
values of mpts. To explore results for a range of mpts values, however, one has
to run HDBSCAN* for each value in the range independently, which is
computationally inefficient. In this paper, we propose an efficient approach to
compute all HDBSCAN* hierarchies for a range of mpts values by replacing the
graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain
the required information. An extensive experimental evaluation shows that with
our approach one can obtain over one hundred hierarchies for the computational
cost equivalent to running HDBSCAN* about 2 times.Comment: A short version of this paper appears at IEEE ICDM 2017. Corrected
typos. Revised abstrac
Fully adaptive density-based clustering
The clusters of a distribution are often defined by the connected components
of a density level set. However, this definition depends on the user-specified
level. We address this issue by proposing a simple, generic algorithm, which
uses an almost arbitrary level set estimator to estimate the smallest level at
which there are more than one connected components. In the case where this
algorithm is fed with histogram-based level set estimates, we provide a finite
sample analysis, which is then used to show that the algorithm consistently
estimates both the smallest level and the corresponding connected components.
We further establish rates of convergence for the two estimation problems, and
last but not least, we present a simple, yet adaptive strategy for determining
the width-parameter of the involved density estimator in a data-depending way.Comment: Published at http://dx.doi.org/10.1214/15-AOS1331 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Stable and consistent density-based clustering
We present a multiscale, consistent approach to density-based clustering that
satisfies stability theorems -- in both the input data and in the parameters --
which hold without distributional assumptions. The stability in the input data
is with respect to the Gromov--Hausdorff--Prokhorov distance on metric
probability spaces and interleaving distances between (multi-parameter)
hierarchical clusterings we introduce. We prove stability results for standard
simplification procedures for hierarchical clusterings, which can be combined
with our approach to yield a stable flat clustering algorithm. We illustrate
the stability of the approach with computational examples. Our framework is
based on the concepts of persistence and interleaving distance from Topological
Data Analysis.Comment: 32 pages, 7 figures. v2: improves exposition, adds computational
example
- …