50,231 research outputs found
Efficient inference for genetic association studies with multiple outcomes
Combined inference for heterogeneous high-dimensional data is critical in
modern biology, where clinical and various kinds of molecular data may be
available from a single study. Classical genetic association studies regress a
single clinical outcome on many genetic variants one by one, but there is an
increasing demand for joint analysis of many molecular outcomes and genetic
variants in order to unravel functional interactions. Unfortunately, most
existing approaches to joint modelling are either too simplistic to be powerful
or are impracticable for computational reasons. Inspired by Richardson et al.
(2010, Bayesian Statistics 9), we consider a sparse multivariate regression
model that allows simultaneous selection of predictors and associated
responses. As Markov chain Monte Carlo (MCMC) inference on such models can be
prohibitively slow when the number of genetic variants exceeds a few thousand,
we propose a variational inference approach which produces posterior
information very close to that of MCMC inference, at a much reduced
computational cost. Extensive numerical experiments show that our approach
outperforms popular variable selection methods and tailored Bayesian
procedures, dealing within hours with problems involving hundreds of thousands
of genetic variants and tens to hundreds of clinical or molecular outcomes
Modelling function-valued processes with complex structure
PhD ThesisExisting approaches to functional principal component analysis (FPCA) usually rely
on nonparametric estimation of the covariance structure. When function-valued processes
are observed on a multidimensional domain, the nonparametric estimation suffers from
the curse of dimensionality, forcing FPCA methods to make restrictive assumptions such
as covariance separability.
In this thesis, we discuss a general Bayesian framework on modelling function-valued
processes by using a Gaussian process (GP) as a prior, enabling us to handle nonseparable and/or nonstationary covariance structure. The nonstationarity is introduced by a
convolution-based approach through a varying kernel, whose parameters vary along the
input space and are estimated via a local empirical Bayesian method. For the varying
anisotropy matrix, we propose to use a spherical parametrisation, leading to unconstrained
and interpretable parameters and allowing for interaction between coordinate directions in
the covariance function. The unconstrained nature allows the parameters to be modelled
as a nonparametric function of time, spatial location and even additional covariates.
In the spirit of FPCA, the Bayesian framework can decompose the function-valued
processes using the eigenvalues and eigensurfaces calculated from the estimated covariance
structure. A finite number of the eigensurfaces can be used to extract some of the most
important information involved in data with complex covariance structure.
We also extend the methods to handle multivariate function-valued processes. The
estimated covariance structure is shown to be important to analyse joint variation in
the data and is further used in our proposed multiple functional partial least squares
regression model. We show that the interaction between the scalar response variable and
function-valued covariates can be explained by fewer terms than in a regression model
which uses multivariate functional principal components.
Simulation studies and applications to real data show that our proposed approaches
provide new insights into the data and excellent prediction results
A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization
The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals ("hits") to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data
A Bayesian Multivariate Functional Dynamic Linear Model
We present a Bayesian approach for modeling multivariate, dependent
functional data. To account for the three dominant structural features in the
data--functional, time dependent, and multivariate components--we extend
hierarchical dynamic linear models for multivariate time series to the
functional data setting. We also develop Bayesian spline theory in a more
general constrained optimization framework. The proposed methods identify a
time-invariant functional basis for the functional observations, which is
smooth and interpretable, and can be made common across multivariate
observations for additional information sharing. The Bayesian framework permits
joint estimation of the model parameters, provides exact inference (up to MCMC
error) on specific parameters, and allows generalized dependence structures.
Sampling from the posterior distribution is accomplished with an efficient
Gibbs sampling algorithm. We illustrate the proposed framework with two
applications: (1) multi-economy yield curve data from the recent global
recession, and (2) local field potential brain signals in rats, for which we
develop a multivariate functional time series approach for multivariate
time-frequency analysis. Supplementary materials, including R code and the
multi-economy yield curve data, are available online
- …