11,547 research outputs found
A Bayesian view of the current status of dark matter direct searches
Bayesian statistical methods offer a simple and consistent framework for
incorporating uncertainties into a multi-parameter inference problem. In this
work we apply these methods to a selection of current direct dark matter
searches. We consider the simplest scenario of spin-independent elastic WIMP
scattering, and infer the WIMP mass and cross-section from the experimental
data with the essential systematic uncertainties folded into the analysis. We
find that when uncertainties in the scintillation efficiency of Xenon100 have
been accounted for, the resulting exclusion limit is not sufficiently
constraining to rule out the CoGeNT preferred parameter region, contrary to
previous claims. In the same vein, we also investigate the impact of
astrophysical uncertainties on the preferred WIMP parameters. We find that
within the class of smooth and isotropic WIMP velocity distributions, it is
difficult to reconcile the DAMA and the CoGeNT preferred regions by tweaking
the astrophysics parameters alone. If we demand compatibility between these
experiments, then the inference process naturally concludes that a high value
for the sodium quenching factor for DAMA is preferred.Comment: 37 pages, 14 figures and 7 tables. Replacement for matching the
version accepted for publicatio
Latent protein trees
Unbiased, label-free proteomics is becoming a powerful technique for
measuring protein expression in almost any biological sample. The output of
these measurements after preprocessing is a collection of features and their
associated intensities for each sample. Subsets of features within the data are
from the same peptide, subsets of peptides are from the same protein, and
subsets of proteins are in the same biological pathways, therefore, there is
the potential for very complex and informative correlational structure inherent
in these data. Recent attempts to utilize this data often focus on the
identification of single features that are associated with a particular
phenotype that is relevant to the experiment. However, to date, there have been
no published approaches that directly model what we know to be multiple
different levels of correlation structure. Here we present a hierarchical
Bayesian model which is specifically designed to model such correlation
structure in unbiased, label-free proteomics. This model utilizes partial
identification information from peptide sequencing and database lookup as well
as the observed correlation in the data to appropriately compress features into
latent proteins and to estimate their correlation structure. We demonstrate the
effectiveness of the model using artificial/benchmark data and in the context
of a series of proteomics measurements of blood plasma from a collection of
volunteers who were infected with two different strains of viral influenza.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS639 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Approximate Inference for Constructing Astronomical Catalogs from Images
We present a new, fully generative model for constructing astronomical
catalogs from optical telescope image sets. Each pixel intensity is treated as
a random variable with parameters that depend on the latent properties of stars
and galaxies. These latent properties are themselves modeled as random. We
compare two procedures for posterior inference. One procedure is based on
Markov chain Monte Carlo (MCMC) while the other is based on variational
inference (VI). The MCMC procedure excels at quantifying uncertainty, while the
VI procedure is 1000 times faster. On a supercomputer, the VI procedure
efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50
terabytes of images in 14.6 minutes, demonstrating the scaling characteristics
necessary to construct catalogs for upcoming astronomical surveys.Comment: accepted to the Annals of Applied Statistic
Simultaneous Inference of User Representations and Trust
Inferring trust relations between social media users is critical for a number
of applications wherein users seek credible information. The fact that
available trust relations are scarce and skewed makes trust prediction a
challenging task. To the best of our knowledge, this is the first work on
exploring representation learning for trust prediction. We propose an approach
that uses only a small amount of binary user-user trust relations to
simultaneously learn user embeddings and a model to predict trust between user
pairs. We empirically demonstrate that for trust prediction, our approach
outperforms classifier-based approaches which use state-of-the-art
representation learning methods like DeepWalk and LINE as features. We also
conduct experiments which use embeddings pre-trained with DeepWalk and LINE
each as an input to our model, resulting in further performance improvement.
Experiments with a dataset of 356K user pairs show that the proposed
method can obtain an high F-score of 92.65%.Comment: To appear in the proceedings of ASONAM'17. Please cite that versio
Algebraic shortcuts for leave-one-out cross-validation in supervised network inference
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models
No unique solution to the seismological problem of standing kink MHD waves
The aim of this paper is to point out that the classic seismological problem
using observations and theoretical expressions for the periods and damping
times of transverse standing magnetohydrodynamic (MHD) waves in coronal loops
is better referred to as a reduced seismological problem. Reduced emphasises
the fact that only a small number of characteristic quantities of the
equilibrium profiles can be determined. Reduced also implies that there is no
unique solution to the full seismological problem. Even the reduced
seismological problem does not allow a unique solution. Bayesian inference
results support our mathematical arguments and offer insight into the
relationship between the algebraic and the probabilistic inversions.Comment: 10 pages, accepted in A&
The Butcher--Oemler effect at z~0.35: a change in perspective
The present paper focuses on the much debated Butcher-Oemler effect: the
increase with redshift of the fraction of blue galaxies in clusters.
Considering a representative cluster sample made of seven group/clusters at
z~0.35, we have measured the blue fraction from the cluster core to the cluster
outskirts and the field mainly using wide field CTIO images. This sample
represents a random selection of a volume complete x-ray selected cluster
sample, selected so that there is no physical connection with the studied
quantity (blue fraction), to minimize observational biases. In order to
statistically assess the significance of the Butcher-Oemler effect, we
introduce the tools of Bayesian inference. Furthermore, we modified the blue
fraction definition in order to take into account the reduced age of the
universe at higher redshifts, because we should no longer attempt to reject an
unphysical universe in which the age of the Universe does depend on redshift,
whereas the age of its content does not. We measured the blue fraction from the
cluster center to the field and we find that the cluster affects the properties
of the galaxies up to two virial radii at z~0.35. Data suggest that during the
last 3 Gyrs no evolution of the blue fraction, from the cluster core to the
field value, is seen beyond the one needed to account for the varying age with
redshift of the Universe and of its content. The agreement of the radial
profiles of the blue fraction at z=0 and z~0.35 implies that the pattern infall
did not change over the last 3 Gyr, or, at least, its variation has no
observational effect on the studied quantity.Comment: MNRAS, in pres
- …