6,321 research outputs found
HIERARCHICAL CLUSTERING USING LEVEL SETS
Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed
Divisive clustering of high dimensional data streams
Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal
- …