Search CORE

444,999 research outputs found

A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling

Author: Kurtek Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of warping functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of warping functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a

k

-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We also show pointwise

95\%

credible intervals to assess the uncertainty of the alignment in different clusters. We validate this model using simulations and present multiple examples on real data from different application domains including biometrics and medicine

arXiv.org e-Print Archive

Crossref

Learning with Clustering Structure

Author: Bach Francis
d'Aspremont Alexandre
Fogel Fajwel
Roulet Vincent
Publication venue
Publication date: 19/09/2016
Field of study

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grouping words together and identify synonyms. The sample clustering problem on the other hand, applies to multiclass problems where we are allowed to make multiple predictions and the performance of the best answer is recorded. We derive a unified optimization formulation highlighting the common structure of these problems and produce algorithms whose core iteration complexity amounts to a k-means clustering step, which can be approximated efficiently. We extend these results to combine sparsity and clustering constraints, and develop a new projection algorithm on the set of clustered sparse vectors. We prove convergence of our algorithms on random instances, based on a union of subspaces interpretation of the clustering structure. Finally, we test the robustness of our methods on artificial data sets as well as real data extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and sparse clustered case. New projection algorithm on sparse clustered vector

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis

Author: M. Teresa Rodríguez-Bernal
Miguel J. Marín
Publication venue
Publication date
Field of study

Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.Clustering, MCMC computation, Microarray analysis, Mixture distributions, Multiple hypothesis testing, Non-central t-distribution

Research Papers in Economics

Dependence of galaxy clustering on UV-luminosity and stellar mass at $z \sim 4 - 7$

Author: Bouwens Rychard J.
Illingworth Garth D.
Labbé Ivo
Mutch Simon J.
Oesch Pascal A.
Qin Yuxiang
Qiu Yisheng
Stefanon Mauro
Wyithe J. Stuart B.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

We investigate the dependence of galaxy clustering at

z \sim 4 - 7

on UV-luminosity and stellar mass. Our sample consists of

\sim

10,000 Lyman-break galaxies (LBGs) in the XDF and CANDELS fields. As part of our analysis, the

M_\star - M_{\rm UV}

relation is estimated for the sample, which is found to have a nearly linear slope of

d\log_{10} M_\star / d M_{\rm UV} \sim 0.44

. We subsequently measure the angular correlation function and bias in different stellar mass and luminosity bins. We focus on comparing the clustering dependence on these two properties. While UV-luminosity is only related to recent starbursts of a galaxy, stellar mass reflects the integrated build-up of the whole star formation history, which should make it more tightly correlated with halo mass. Hence, the clustering segregation with stellar mass is expected to be larger than with luminosity. However, our measurements suggest that the segregation with luminosity is larger with

\simeq 90\%

confidence (neglecting contributions from systematic errors). We compare this unexpected result with predictions from the \textsc{Meraxes} semi-analytic galaxy formation model. Interestingly, the model reproduces the observed angular correlation functions, and also suggests stronger clustering segregation with luminosity. The comparison between our observations and the model provides evidence of multiple halo occupation in the small scale clustering.Comment: 10 pages, 6 figures, 2 tables, accepted for publication in MNRA

arXiv.org e-Print Archive

Leiden University Scholary Publications

Redshift-space distortions of galaxies, clusters and AGN: testing how the accuracy of growth rate measurements depends on scales and sample selections

Author: Cimatti Andrea
Dolag Klaus
Marulli Federico
Moscardini Lauro
Veropalumbo Alfonso
Publication venue: 'EDP Sciences'
Publication date: 01/03/2017
Field of study

Redshift-space clustering anisotropies caused by cosmic peculiar velocities provide a powerful probe to test the gravity theory on large scales. However, to extract unbiased physical constraints, the clustering pattern has to be modelled accurately, taking into account the effects of non-linear dynamics at small scales, and properly describing the link between the selected cosmic tracers and the underlying dark matter field. We use a large hydrodynamic simulation to investigate how the systematic error on the linear growth rate,

f

, caused by model uncertainties, depends on sample selections and comoving scales. Specifically, we measure the redshift-space two-point correlation function of mock samples of galaxies, galaxy clusters and Active Galactic Nuclei, extracted from the Magneticum simulation, in the redshift range 0.2 < z < 2, and adopting different sample selections. We estimate

f\sigma_8

by modelling both the monopole and the full two-dimensional anisotropic clustering, using the dispersion model. We find that the systematic error on

f\sigma_8

depends significantly on the range of scales considered for the fit. If the latter is kept fixed, the error depends on both redshift and sample selection, due to the scale-dependent impact of non-linearities, if not properly modelled. On the other hand, we show that it is possible to get unbiased constraints on

f\sigma_8

provided that the analysis is restricted to a proper range of scales, that depends non trivially on the properties of the sample. This can have a strong impact on multiple tracers analyses, and when combining catalogues selected at different redshifts.Comment: 17 pages, 14 figures. Accepted for publication in Astronomy & Astrophysic

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

MPG.PuRe