Search CORE

94,689 research outputs found

A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets

Author: Gonzalez Santiago Fernandez
Krause Rolf
Pizzagalli Diego Ulisse
Publication venue
Publication date: 31/12/2018
Field of study

Clustering is a technique for the analysis of datasets obtained by empirical studies in several disciplines with a major application for biomedical research. Essentially, clustering algorithms are executed by machines aiming at finding groups of related points in a dataset. However, the result of grouping depends on both metrics for point-to-point similarity and rules for point-to-group association. Indeed, non-appropriate metrics and rules can lead to undesirable clustering artifacts. This is especially relevant for datasets, where groups with heterogeneous structures co-exist. In this work, we propose an algorithm that achieves clustering by exploring the paths between points. This allows both, to evaluate the properties of the path (such as gaps, density variations, etc.), and expressing the preference for certain paths. Moreover, our algorithm supports the integration of existing knowledge about admissible and non-admissible clusters by training a path classifier. We demonstrate the accuracy of the proposed method on challenging datasets including points from synthetic shapes in publicly available benchmarks and microscopy data

arXiv.org e-Print Archive

HIERARCHICAL CLUSTERING USING LEVEL SETS

Author: Indaco Francesco
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2012
Field of study

Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

SJSU ScholarWorks

Weak Lensing Peak Finding: Estimators, Filters, and Biases

Author: Bernardeau
Eduardo Rozo
Fabian Schmidt
Hennawi
Henry
Marian
Matsubara
Medezinski
Miyazaki
Navarro
Oaxaca Wright
Rozo
Rozo
Schmidt
Schneider
Schneider
Tinker
Van Waerbeke
Vanderlinde
Vikhlinin
Wang
Wittman
Publication venue: 'IOP Publishing'
Publication date: 03/09/2010
Field of study

Large catalogs of shear-selected peaks have recently become a reality. In order to properly interpret the abundance and properties of these peaks, it is necessary to take into account the effects of the clustering of source galaxies, among themselves and with the lens. In addition, the preferred selection of lensed galaxies in a flux- and size-limited sample leads to fluctuations in the apparent source density which correlate with the lensing field (lensing bias). In this paper, we investigate these issues for two different choices of shear estimators which are commonly in use today: globally-normalized and locally-normalized estimators. While in principle equivalent, in practice these estimators respond differently to systematic effects such as lensing bias and cluster member dilution. Furthermore, we find that which estimator is statistically superior depends on the specific shape of the filter employed for peak finding; suboptimal choices of the estimator+filter combination can result in a suppression of the number of high peaks by orders of magnitude. Lensing bias generally acts to increase the signal-to-noise \nu of shear peaks; for high peaks the boost can be as large as \Delta \nu ~ 1-2. Due to the steepness of the peak abundance function, these boosts can result in a significant increase in the abundance of shear peaks. A companion paper (Rozo et al., 2010) investigates these same issues within the context of stacked weak lensing mass estimates.Comment: 11 pages, 8 figures; comments welcom

arXiv.org e-Print Archive

Crossref

Caltech Authors