94,689 research outputs found

    A shortest-path based clustering algorithm for joint human-machine analysis of complex datasets

    Full text link
    Clustering is a technique for the analysis of datasets obtained by empirical studies in several disciplines with a major application for biomedical research. Essentially, clustering algorithms are executed by machines aiming at finding groups of related points in a dataset. However, the result of grouping depends on both metrics for point-to-point similarity and rules for point-to-group association. Indeed, non-appropriate metrics and rules can lead to undesirable clustering artifacts. This is especially relevant for datasets, where groups with heterogeneous structures co-exist. In this work, we propose an algorithm that achieves clustering by exploring the paths between points. This allows both, to evaluate the properties of the path (such as gaps, density variations, etc.), and expressing the preference for certain paths. Moreover, our algorithm supports the integration of existing knowledge about admissible and non-admissible clusters by training a path classifier. We demonstrate the accuracy of the proposed method on challenging datasets including points from synthetic shapes in publicly available benchmarks and microscopy data

    HIERARCHICAL CLUSTERING USING LEVEL SETS

    Get PDF
    Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

    Weak Lensing Peak Finding: Estimators, Filters, and Biases

    Get PDF
    Large catalogs of shear-selected peaks have recently become a reality. In order to properly interpret the abundance and properties of these peaks, it is necessary to take into account the effects of the clustering of source galaxies, among themselves and with the lens. In addition, the preferred selection of lensed galaxies in a flux- and size-limited sample leads to fluctuations in the apparent source density which correlate with the lensing field (lensing bias). In this paper, we investigate these issues for two different choices of shear estimators which are commonly in use today: globally-normalized and locally-normalized estimators. While in principle equivalent, in practice these estimators respond differently to systematic effects such as lensing bias and cluster member dilution. Furthermore, we find that which estimator is statistically superior depends on the specific shape of the filter employed for peak finding; suboptimal choices of the estimator+filter combination can result in a suppression of the number of high peaks by orders of magnitude. Lensing bias generally acts to increase the signal-to-noise \nu of shear peaks; for high peaks the boost can be as large as \Delta \nu ~ 1-2. Due to the steepness of the peak abundance function, these boosts can result in a significant increase in the abundance of shear peaks. A companion paper (Rozo et al., 2010) investigates these same issues within the context of stacked weak lensing mass estimates.Comment: 11 pages, 8 figures; comments welcom
    • …
    corecore