21,504 research outputs found
Graph Estimation From Multi-attribute Data
Many real world network problems often concern multivariate nodal attributes
such as image, textual, and multi-view feature vectors on nodes, rather than
simple univariate nodal attributes. The existing graph estimation methods built
on Gaussian graphical models and covariance selection algorithms can not handle
such data, neither can the theories developed around such methods be directly
applied. In this paper, we propose a new principled framework for estimating
graphs from multi-attribute data. Instead of estimating the partial correlation
as in current literature, our method estimates the partial canonical
correlations that naturally accommodate complex nodal features.
Computationally, we provide an efficient algorithm which utilizes the
multi-attribute structure. Theoretically, we provide sufficient conditions
which guarantee consistent graph recovery. Extensive simulation studies
demonstrate performance of our method under various conditions. Furthermore, we
provide illustrative applications to uncovering gene regulatory networks from
gene and protein profiles, and uncovering brain connectivity graph from
functional magnetic resonance imaging data.Comment: Extended simulation study. Added an application to a new data se
MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure
Markov chain Monte Carlo (MCMC) samplers are numerical methods for drawing samples from a given target probability distribution. We discuss one particular MCMC sampler, the MALA-within-Gibbs sampler, from the theoretical and practical perspectives. We first show that the acceptance ratio and step size of this sampler are independent of the overall problem dimension when (i) the target distribution has sparse conditional structure, and (ii) this structure is reflected in the partial updating strategy of MALA-within-Gibbs. If, in addition, the target density is blockwise log-concave, then the sampler's convergence rate is independent of dimension. From a practical perspective, we expect that MALA-within-Gibbs is useful for solving high-dimensional Bayesian inference problems where the posterior exhibits sparse conditional structure at least approximately. In this context, a partitioning of the state that correctly reflects the sparse conditional structure must be found, and we illustrate this process in two numerical examples. We also discuss trade-offs between the block size used for partial updating and computational requirements that may increase with the number of blocks
A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems
We address the numerical solution of infinite-dimensional inverse problems in
the framework of Bayesian inference. In the Part I companion to this paper
(arXiv.org:1308.1313), we considered the linearized infinite-dimensional
inverse problem. Here in Part II, we relax the linearization assumption and
consider the fully nonlinear infinite-dimensional inverse problem using a
Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of
sampling high-dimensional pdfs arising from Bayesian inverse problems governed
by PDEs, we build on the stochastic Newton MCMC method. This method exploits
problem structure by taking as a proposal density a local Gaussian
approximation of the posterior pdf, whose construction is made tractable by
invoking a low-rank approximation of its data misfit component of the Hessian.
Here we introduce an approximation of the stochastic Newton proposal in which
we compute the low-rank-based Hessian at just the MAP point, and then reuse
this Hessian at each MCMC step. We compare the performance of the proposed
method to the original stochastic Newton MCMC method and to an independence
sampler. The comparison of the three methods is conducted on a synthetic ice
sheet inverse problem. For this problem, the stochastic Newton MCMC method with
a MAP-based Hessian converges at least as rapidly as the original stochastic
Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian
at each step. On the other hand, it is more expensive per sample than the
independence sampler; however, its convergence is significantly more rapid, and
thus overall it is much cheaper. Finally, we present extensive analysis and
interpretation of the posterior distribution, and classify directions in
parameter space based on the extent to which they are informed by the prior or
the observations.Comment: 31 page
Joint state-parameter estimation of a nonlinear stochastic energy balance model from sparse noisy data
While nonlinear stochastic partial differential equations arise naturally in
spatiotemporal modeling, inference for such systems often faces two major
challenges: sparse noisy data and ill-posedness of the inverse problem of
parameter estimation. To overcome the challenges, we introduce a strongly
regularized posterior by normalizing the likelihood and by imposing physical
constraints through priors of the parameters and states. We investigate joint
parameter-state estimation by the regularized posterior in a physically
motivated nonlinear stochastic energy balance model (SEBM) for paleoclimate
reconstruction. The high-dimensional posterior is sampled by a particle Gibbs
sampler that combines MCMC with an optimal particle filter exploiting the
structure of the SEBM. In tests using either Gaussian or uniform priors based
on the physical range of parameters, the regularized posteriors overcome the
ill-posedness and lead to samples within physical ranges, quantifying the
uncertainty in estimation. Due to the ill-posedness and the regularization, the
posterior of parameters presents a relatively large uncertainty, and
consequently, the maximum of the posterior, which is the minimizer in a
variational approach, can have a large variation. In contrast, the posterior of
states generally concentrates near the truth, substantially filtering out
observation noise and reducing uncertainty in the unconstrained SEBM
Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases
For big data analysis, high computational cost for Bayesian methods often
limits their applications in practice. In recent years, there have been many
attempts to improve computational efficiency of Bayesian inference. Here we
propose an efficient and scalable computational technique for a
state-of-the-art Markov Chain Monte Carlo (MCMC) methods, namely, Hamiltonian
Monte Carlo (HMC). The key idea is to explore and exploit the structure and
regularity in parameter space for the underlying probabilistic model to
construct an effective approximation of its geometric properties. To this end,
we build a surrogate function to approximate the target distribution using
properly chosen random bases and an efficient optimization process. The
resulting method provides a flexible, scalable, and efficient sampling
algorithm, which converges to the correct target distribution. We show that by
choosing the basis functions and optimization process differently, our method
can be related to other approaches for the construction of surrogate functions
such as generalized additive models or Gaussian process models. Experiments
based on simulated and real data show that our approach leads to substantially
more efficient sampling algorithms compared to existing state-of-the art
methods
- …