444,999 research outputs found

    A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling

    Full text link
    We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of warping functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of warping functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a kk-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We also show pointwise 95%95\% credible intervals to assess the uncertainty of the alignment in different clusters. We validate this model using simulations and present multiple examples on real data from different application domains including biometrics and medicine

    Learning with Clustering Structure

    Full text link
    We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grouping words together and identify synonyms. The sample clustering problem on the other hand, applies to multiclass problems where we are allowed to make multiple predictions and the performance of the best answer is recorded. We derive a unified optimization formulation highlighting the common structure of these problems and produce algorithms whose core iteration complexity amounts to a k-means clustering step, which can be approximated efficiently. We extend these results to combine sparsity and clustering constraints, and develop a new projection algorithm on the set of clustered sparse vectors. We prove convergence of our algorithms on random instances, based on a union of subspaces interpretation of the clustering structure. Finally, we test the robustness of our methods on artificial data sets as well as real data extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and sparse clustered case. New projection algorithm on sparse clustered vector

    Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis

    Get PDF
    Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.Clustering, MCMC computation, Microarray analysis, Mixture distributions, Multiple hypothesis testing, Non-central t-distribution

    Dependence of galaxy clustering on UV-luminosity and stellar mass at z47z \sim 4 - 7

    Get PDF
    We investigate the dependence of galaxy clustering at z47z \sim 4 - 7 on UV-luminosity and stellar mass. Our sample consists of \sim 10,000 Lyman-break galaxies (LBGs) in the XDF and CANDELS fields. As part of our analysis, the MMUVM_\star - M_{\rm UV} relation is estimated for the sample, which is found to have a nearly linear slope of dlog10M/dMUV0.44d\log_{10} M_\star / d M_{\rm UV} \sim 0.44. We subsequently measure the angular correlation function and bias in different stellar mass and luminosity bins. We focus on comparing the clustering dependence on these two properties. While UV-luminosity is only related to recent starbursts of a galaxy, stellar mass reflects the integrated build-up of the whole star formation history, which should make it more tightly correlated with halo mass. Hence, the clustering segregation with stellar mass is expected to be larger than with luminosity. However, our measurements suggest that the segregation with luminosity is larger with 90%\simeq 90\% confidence (neglecting contributions from systematic errors). We compare this unexpected result with predictions from the \textsc{Meraxes} semi-analytic galaxy formation model. Interestingly, the model reproduces the observed angular correlation functions, and also suggests stronger clustering segregation with luminosity. The comparison between our observations and the model provides evidence of multiple halo occupation in the small scale clustering.Comment: 10 pages, 6 figures, 2 tables, accepted for publication in MNRA

    Redshift-space distortions of galaxies, clusters and AGN: testing how the accuracy of growth rate measurements depends on scales and sample selections

    Full text link
    Redshift-space clustering anisotropies caused by cosmic peculiar velocities provide a powerful probe to test the gravity theory on large scales. However, to extract unbiased physical constraints, the clustering pattern has to be modelled accurately, taking into account the effects of non-linear dynamics at small scales, and properly describing the link between the selected cosmic tracers and the underlying dark matter field. We use a large hydrodynamic simulation to investigate how the systematic error on the linear growth rate, ff, caused by model uncertainties, depends on sample selections and comoving scales. Specifically, we measure the redshift-space two-point correlation function of mock samples of galaxies, galaxy clusters and Active Galactic Nuclei, extracted from the Magneticum simulation, in the redshift range 0.2 < z < 2, and adopting different sample selections. We estimate fσ8f\sigma_8 by modelling both the monopole and the full two-dimensional anisotropic clustering, using the dispersion model. We find that the systematic error on fσ8f\sigma_8 depends significantly on the range of scales considered for the fit. If the latter is kept fixed, the error depends on both redshift and sample selection, due to the scale-dependent impact of non-linearities, if not properly modelled. On the other hand, we show that it is possible to get unbiased constraints on fσ8f\sigma_8 provided that the analysis is restricted to a proper range of scales, that depends non trivially on the properties of the sample. This can have a strong impact on multiple tracers analyses, and when combining catalogues selected at different redshifts.Comment: 17 pages, 14 figures. Accepted for publication in Astronomy & Astrophysic
    corecore