444,999 research outputs found
A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling
We present a Bayesian model for pairwise nonlinear registration of functional
data. We use the Riemannian geometry of the space of warping functions to
define appropriate prior distributions and sample from the posterior using
importance sampling. A simple square-root transformation is used to simplify
the geometry of the space of warping functions, which allows for computation of
sample statistics, such as the mean and median, and a fast implementation of a
-means clustering algorithm. These tools allow for efficient posterior
inference, where multiple modes of the posterior distribution corresponding to
multiple plausible alignments of the given functions are found. We also show
pointwise credible intervals to assess the uncertainty of the alignment
in different clusters. We validate this model using simulations and present
multiple examples on real data from different application domains including
biometrics and medicine
Learning with Clustering Structure
We study supervised learning problems using clustering constraints to impose
structure on either features or samples, seeking to help both prediction and
interpretation. The problem of clustering features arises naturally in text
classification for instance, to reduce dimensionality by grouping words
together and identify synonyms. The sample clustering problem on the other
hand, applies to multiclass problems where we are allowed to make multiple
predictions and the performance of the best answer is recorded. We derive a
unified optimization formulation highlighting the common structure of these
problems and produce algorithms whose core iteration complexity amounts to a
k-means clustering step, which can be approximated efficiently. We extend these
results to combine sparsity and clustering constraints, and develop a new
projection algorithm on the set of clustered sparse vectors. We prove
convergence of our algorithms on random instances, based on a union of
subspaces interpretation of the clustering structure. Finally, we test the
robustness of our methods on artificial data sets as well as real data
extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and
sparse clustered case. New projection algorithm on sparse clustered vector
Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis
Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.Clustering, MCMC computation, Microarray analysis, Mixture distributions, Multiple hypothesis testing, Non-central t-distribution
Dependence of galaxy clustering on UV-luminosity and stellar mass at
We investigate the dependence of galaxy clustering at on
UV-luminosity and stellar mass. Our sample consists of 10,000
Lyman-break galaxies (LBGs) in the XDF and CANDELS fields. As part of our
analysis, the relation is estimated for the sample,
which is found to have a nearly linear slope of . We subsequently measure the angular correlation function and
bias in different stellar mass and luminosity bins. We focus on comparing the
clustering dependence on these two properties. While UV-luminosity is only
related to recent starbursts of a galaxy, stellar mass reflects the integrated
build-up of the whole star formation history, which should make it more tightly
correlated with halo mass. Hence, the clustering segregation with stellar mass
is expected to be larger than with luminosity. However, our measurements
suggest that the segregation with luminosity is larger with
confidence (neglecting contributions from systematic errors). We compare this
unexpected result with predictions from the \textsc{Meraxes} semi-analytic
galaxy formation model. Interestingly, the model reproduces the observed
angular correlation functions, and also suggests stronger clustering
segregation with luminosity. The comparison between our observations and the
model provides evidence of multiple halo occupation in the small scale
clustering.Comment: 10 pages, 6 figures, 2 tables, accepted for publication in MNRA
Redshift-space distortions of galaxies, clusters and AGN: testing how the accuracy of growth rate measurements depends on scales and sample selections
Redshift-space clustering anisotropies caused by cosmic peculiar velocities
provide a powerful probe to test the gravity theory on large scales. However,
to extract unbiased physical constraints, the clustering pattern has to be
modelled accurately, taking into account the effects of non-linear dynamics at
small scales, and properly describing the link between the selected cosmic
tracers and the underlying dark matter field. We use a large hydrodynamic
simulation to investigate how the systematic error on the linear growth rate,
, caused by model uncertainties, depends on sample selections and comoving
scales. Specifically, we measure the redshift-space two-point correlation
function of mock samples of galaxies, galaxy clusters and Active Galactic
Nuclei, extracted from the Magneticum simulation, in the redshift range 0.2 < z
< 2, and adopting different sample selections. We estimate by
modelling both the monopole and the full two-dimensional anisotropic
clustering, using the dispersion model. We find that the systematic error on
depends significantly on the range of scales considered for the
fit. If the latter is kept fixed, the error depends on both redshift and sample
selection, due to the scale-dependent impact of non-linearities, if not
properly modelled. On the other hand, we show that it is possible to get
unbiased constraints on provided that the analysis is restricted to
a proper range of scales, that depends non trivially on the properties of the
sample. This can have a strong impact on multiple tracers analyses, and when
combining catalogues selected at different redshifts.Comment: 17 pages, 14 figures. Accepted for publication in Astronomy &
Astrophysic
- …