24,138 research outputs found
Multiple locus linkage analysis of genomewide expression in yeast.
With the ability to measure thousands of related phenotypes from a single biological sample, it is now feasible to genetically dissect systems-level biological phenomena. The genetics of transcriptional regulation and protein abundance are likely to be complex, meaning that genetic variation at multiple loci will influence these phenotypes. Several recent studies have investigated the role of genetic variation in transcription by applying traditional linkage analysis methods to genomewide expression data, where each gene expression level was treated as a quantitative trait and analyzed separately from one another. Here, we develop a new, computationally efficient method for simultaneously mapping multiple gene expression quantitative trait loci that directly uses all of the available data. Information shared across gene expression traits is captured in a way that makes minimal assumptions about the statistical properties of the data. The method produces easy-to-interpret measures of statistical significance for both individual loci and the overall joint significance of multiple loci selected for a given expression trait. We apply the new method to a cross between two strains of the budding yeast Saccharomyces cerevisiae, and estimate that at least 37% of all gene expression traits show two simultaneous linkages, where we have allowed for epistatic interactions. Pairs of jointly linking quantitative trait loci are identified with high confidence for 170 gene expression traits, where it is expected that both loci are true positives for at least 153 traits. In addition, we are able to show that epistatic interactions contribute to gene expression variation for at least 14% of all traits. We compare the proposed approach to an exhaustive two-dimensional scan over all pairs of loci. Surprisingly, we demonstrate that an exhaustive two-dimensional scan is less powerful than the sequential search used here. In addition, we show that a two-dimensional scan does not truly allow one to test for simultaneous linkage, and the statistical significance measured from this existing method cannot be interpreted among many traits
Using parallel computation to improve Independent Metropolis--Hastings based estimation
In this paper, we consider the implications of the fact that parallel
raw-power can be exploited by a generic Metropolis--Hastings algorithm if the
proposed values are independent. In particular, we present improvements to the
independent Metropolis--Hastings algorithm that significantly decrease the
variance of any estimator derived from the MCMC output, for a null computing
cost since those improvements are based on a fixed number of target density
evaluations. Furthermore, the techniques developed in this paper do not
jeopardize the Markovian convergence properties of the algorithm, since they
are based on the Rao--Blackwell principles of Gelfand and Smith (1990), already
exploited in Casella and Robert (1996), Atchade and Perron (2005) and Douc and
Robert (2010). We illustrate those improvements both on a toy normal example
and on a classical probit regression model, but stress the fact that they are
applicable in any case where the independent Metropolis-Hastings is applicable.Comment: 19 pages, 8 figures, to appear in Journal of Computational and
Graphical Statistic
WARP: Wavelets with adaptive recursive partitioning for multi-dimensional data
Effective identification of asymmetric and local features in images and other
data observed on multi-dimensional grids plays a critical role in a wide range
of applications including biomedical and natural image processing. Moreover,
the ever increasing amount of image data, in terms of both the resolution per
image and the number of images processed per application, requires algorithms
and methods for such applications to be computationally efficient. We develop a
new probabilistic framework for multi-dimensional data to overcome these
challenges through incorporating data adaptivity into discrete wavelet
transforms, thereby allowing them to adapt to the geometric structure of the
data while maintaining the linear computational scalability. By exploiting a
connection between the local directionality of wavelet transforms and recursive
dyadic partitioning on the grid points of the observation, we obtain the
desired adaptivity through adding to the traditional Bayesian wavelet
regression framework an additional layer of Bayesian modeling on the space of
recursive partitions over the grid points. We derive the corresponding
inference recipe in the form of a recursive representation of the exact
posterior, and develop a class of efficient recursive message passing
algorithms for achieving exact Bayesian inference with a computational
complexity linear in the resolution and sample size of the images. While our
framework is applicable to a range of problems including multi-dimensional
signal processing, compression, and structural learning, we illustrate its work
and evaluate its performance in the context of 2D and 3D image reconstruction
using real images from the ImageNet database. We also apply the framework to
analyze a data set from retinal optical coherence tomography
Bayesian analysis of ranking data with the constrained Extended Plackett-Luce model
Multistage ranking models, including the popular Plackett-Luce distribution
(PL), rely on the assumption that the ranking process is performed
sequentially, by assigning the positions from the top to the bottom one
(forward order). A recent contribution to the ranking literature relaxed this
assumption with the addition of the discrete-valued reference order parameter,
yielding the novel Extended Plackett-Luce model (EPL). Inference on the EPL and
its generalization into a finite mixture framework was originally addressed
from the frequentist perspective. In this work, we propose the Bayesian
estimation of the EPL with order constraints on the reference order parameter.
The proposed restrictions reflect a meaningful rank assignment process. By
combining the restrictions with the data augmentation strategy and the
conjugacy of the Gamma prior distribution with the EPL, we facilitate the
construction of a tuned joint Metropolis-Hastings algorithm within Gibbs
sampling to simulate from the posterior distribution. The Bayesian approach
allows to address more efficiently the inference on the additional
discrete-valued parameter and the assessment of its estimation uncertainty. The
usefulness of the proposal is illustrated with applications to simulated and
real datasets.Comment: 20 pages, 4 figures, 4 tables. arXiv admin note: substantial text
overlap with arXiv:1803.0288
Computing the Shapley value in allocation problems: approximations and bounds, with an application to the Italian VQR research assessment program
In allocation problems, a given set of goods are assigned to agents in such a way that the social welfare is maximised, that is, the largest possible global worth is achieved. When goods are indivisible, it is possible to use money compensation to perform a fair allocation taking into account the actual contribution of all agents to the social welfare. Coalitional games provide a formal mathematical framework to model such problems, in particular the Shapley value is a solution concept widely used for assigning worths to agents in a fair way. Unfortunately, computing this value is a #P-hard problem, so that applying this good theoretical notion is often quite difficult in real-world problems.
We describe useful properties that allow us to greatly simplify the instances of allocation problems,
without affecting the Shapley value of any player. Moreover, we propose algorithms for computing lower bounds and upper bounds of the Shapley value, which in some cases provide the exact result and that can be combined with approximation algorithms.
The proposed techniques have been implemented and tested on a real-world application of allocation problems, namely, the Italian research assessment program known as VQR (Verifica della Qualità della Ricerca, or Research Quality Assessment)1. For the large university considered in the experiments, the
problem involves thousands of agents and goods (here, researchers and their research products). The
algorithms described in the paper are able to compute the Shapley value for most of those agents, and to
get a good approximation of the Shapley value for all of the
- …