132,208 research outputs found

    A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm

    Full text link
    We present a new cluster detection algorithm designed for finding high-redshift clusters using optical/infrared imaging data. The algorithm has two main characteristics. First, it utilises each galaxy's full redshift probability function, instead of an estimate of the photometric redshift based on the peak of the probability function and an associated Gaussian error. Second, it identifies cluster candidates through cross-checking the results of two substantially different selection techniques (the name 2TecX representing the cross-check of the two techniques). These are adaptations of the Voronoi Tesselations and Friends-Of-Friends methods. Monte-Carlo simulations of mock catalogues show that cross-checking the cluster candidates found by the two techniques significantly reduces the detection of spurious sources. Furthermore, we examine the selection effects and relative strengths and weaknesses of either method. The simulations also allow us to fine-tune the algorithm's parameters, and define completeness and mass limit as a function of redshift. We demonstrate that the algorithm isolates high-redshift clusters at a high level of efficiency and low contamination.Comment: 13 Pages, 17 figures, accepted for publication in MNRA

    A Bayesian approach to discrete object detection in astronomical datasets

    Full text link
    A Bayesian approach is presented for detecting and characterising the signal from discrete objects embedded in a diffuse background. The approach centres around the evaluation of the posterior distribution for the parameters of the discrete objects, given the observed data, and defines the theoretically-optimal procedure for parametrised object detection. Two alternative strategies are investigated: the simultaneous detection of all the discrete objects in the dataset, and the iterative detection of objects. In both cases, the parameter space characterising the object(s) is explored using Markov-Chain Monte-Carlo sampling. For the iterative detection of objects, another approach is to locate the global maximum of the posterior at each iteration using a simulated annealing downhill simplex algorithm. The techniques are applied to a two-dimensional toy problem consisting of Gaussian objects embedded in uncorrelated pixel noise. A cosmological illustration of the iterative approach is also presented, in which the thermal and kinetic Sunyaev-Zel'dovich effects from clusters of galaxies are detected in microwave maps dominated by emission from primordial cosmic microwave background anisotropies.Comment: 20 pages, 12 figures, accepted by MNRAS; contains some additional material in response to referee's comment

    Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

    Full text link
    We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.Comment: For sparse PCA and submatrix localization, we determine the information-theoretic threshold exactly in the limit where the number of blocks is large or the signal matrix is very sparse based on a conditional second moment method, closing the factor of root two gap in the first versio

    Sunyaev-Zel'dovich clusters reconstruction in multiband bolometer camera surveys

    Full text link
    We present a new method for the reconstruction of Sunyaev-Zel'dovich (SZ) galaxy clusters in future SZ-survey experiments using multiband bolometer cameras such as Olimpo, APEX, or Planck. Our goal is to optimise SZ-Cluster extraction from our observed noisy maps. We wish to emphasize that none of the algorithms used in the detection chain is tuned on prior knowledge on the SZ -Cluster signal, or other astrophysical sources (Optical Spectrum, Noise Covariance Matrix, or covariance of SZ Cluster wavelet coefficients). First, a blind separation of the different astrophysical components which contribute to the observations is conducted using an Independent Component Analysis (ICA) method. Then, a recent non linear filtering technique in the wavelet domain, based on multiscale entropy and the False Discovery Rate (FDR) method, is used to detect and reconstruct the galaxy clusters. Finally, we use the Source Extractor software to identify the detected clusters. The proposed method was applied on realistic simulations of observations. As for global detection efficiency, this new method is impressive as it provides comparable results to Pierpaoli et al. method being however a blind algorithm. Preprint with full resolution figures is available at the URL: w10-dapnia.saclay.cea.fr/Phocea/Vie_des_labos/Ast/ast_visu.php?id_ast=728Comment: Submitted to A&A. 32 Pages, text onl

    A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7

    Full text link
    We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.Comment: Updated to match the published version. The catalog can be accessed from: http://home.fnal.gov/~jghao/gmbcg_sdss_catalog.htm

    Bayesian Cluster Enumeration Criterion for Unsupervised Learning

    Full text link
    We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.Comment: 14 pages, 7 figure
    • …