Search CORE

21 research outputs found

HIERARCHICAL CLUSTERING USING LEVEL SETS

Author: Indaco Francesco
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2012
Field of study

Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

SJSU ScholarWorks

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift

Author: Jang Jennifer
Jiang Heinrich
Kpotufe Samory
Publication venue
Publication date: 01/01/2018
Field of study

We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift. We establish statistical consistency guarantees for this modification. We then show strong clustering performance on real datasets as well as promising applications to image segmentation.Comment: ICML 2018. Code release: https://github.com/google/quickshif

arXiv.org e-Print Archive

Princeton University Open Access Repository

Introduction to the R package TDA

Author: Fasy Brittany Terese
Kim Jisu
Lecci Fabrizio
Maria Clément
Publication venue
Publication date: 29/01/2015
Field of study

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement recently developed statistical methods. The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

Author: Póczos Barnabás
Singh Shashank
Publication venue
Publication date: 01/01/2016
Field of study

We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires

k \to \infty

as the sample size

n \to \infty

) into the functional of interest, the estimators we consider fix k and perform a bias correction. This is more efficient computationally, and, as we show in certain cases, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.Comment: 16 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Consistent procedures for cluster tree estimation and pruning

Author: Chaudhuri Kamalika
Dasgupta Sanjoy
Kpotufe Samory
von Luxburg Ulrike
Publication venue
Publication date: 05/06/2014
Field of study

For a density

f

{\mathbb R}^d

, a {\it high-density cluster} is any connected component of

\{x: f(x) \geq \lambda\}

, for some

\lambda > 0

. The set of all high-density clusters forms a hierarchy called the {\it cluster tree} of

f

. We present two procedures for estimating the cluster tree given samples from

f

. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the

k

-nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref