94,689 research outputs found
A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets
Clustering is a technique for the analysis of datasets obtained by empirical
studies in several disciplines with a major application for biomedical
research. Essentially, clustering algorithms are executed by machines aiming at
finding groups of related points in a dataset. However, the result of grouping
depends on both metrics for point-to-point similarity and rules for
point-to-group association. Indeed, non-appropriate metrics and rules can lead
to undesirable clustering artifacts. This is especially relevant for datasets,
where groups with heterogeneous structures co-exist. In this work, we propose
an algorithm that achieves clustering by exploring the paths between points.
This allows both, to evaluate the properties of the path (such as gaps, density
variations, etc.), and expressing the preference for certain paths. Moreover,
our algorithm supports the integration of existing knowledge about admissible
and non-admissible clusters by training a path classifier. We demonstrate the
accuracy of the proposed method on challenging datasets including points from
synthetic shapes in publicly available benchmarks and microscopy data
HIERARCHICAL CLUSTERING USING LEVEL SETS
Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed
Weak Lensing Peak Finding: Estimators, Filters, and Biases
Large catalogs of shear-selected peaks have recently become a reality. In
order to properly interpret the abundance and properties of these peaks, it is
necessary to take into account the effects of the clustering of source
galaxies, among themselves and with the lens. In addition, the preferred
selection of lensed galaxies in a flux- and size-limited sample leads to
fluctuations in the apparent source density which correlate with the lensing
field (lensing bias). In this paper, we investigate these issues for two
different choices of shear estimators which are commonly in use today:
globally-normalized and locally-normalized estimators. While in principle
equivalent, in practice these estimators respond differently to systematic
effects such as lensing bias and cluster member dilution. Furthermore, we find
that which estimator is statistically superior depends on the specific shape of
the filter employed for peak finding; suboptimal choices of the
estimator+filter combination can result in a suppression of the number of high
peaks by orders of magnitude. Lensing bias generally acts to increase the
signal-to-noise \nu of shear peaks; for high peaks the boost can be as large as
\Delta \nu ~ 1-2. Due to the steepness of the peak abundance function, these
boosts can result in a significant increase in the abundance of shear peaks. A
companion paper (Rozo et al., 2010) investigates these same issues within the
context of stacked weak lensing mass estimates.Comment: 11 pages, 8 figures; comments welcom
- …