3,326 research outputs found
High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains
In this paper, we present a novel framework incorporating a combination of
sparse models in different domains. We posit the observed data as generated
from a linear combination of a sparse Gaussian Markov model (with a sparse
precision matrix) and a sparse Gaussian independence model (with a sparse
covariance matrix). We provide efficient methods for decomposition of the data
into two domains, \viz Markov and independence domains. We characterize a set
of sufficient conditions for identifiability and model consistency. Our
decomposition method is based on a simple modification of the popular
-penalized maximum-likelihood estimator (-MLE). We establish
that our estimator is consistent in both the domains, i.e., it successfully
recovers the supports of both Markov and independence models, when the number
of samples scales as , where is the number of
variables and is the maximum node degree in the Markov model. Our
conditions for recovery are comparable to those of -MLE for consistent
estimation of a sparse Markov model, and thus, we guarantee successful
high-dimensional estimation of a richer class of models under comparable
conditions. Our experiments validate these results and also demonstrate that
our models have better inference accuracy under simple algorithms such as loopy
belief propagation.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
High-Dimensional Bayesian Geostatistics
With the growing capabilities of Geographic Information Systems (GIS) and
user-friendly software, statisticians today routinely encounter geographically
referenced data containing observations from a large number of spatial
locations and time points. Over the last decade, hierarchical spatiotemporal
process models have become widely deployed statistical tools for researchers to
better understand the complex nature of spatial and temporal variability.
However, fitting hierarchical spatiotemporal models often involves expensive
matrix computations with complexity increasing in cubic order for the number of
spatial locations and temporal points. This renders such models unfeasible for
large data sets. This article offers a focused review of two methods for
constructing well-defined highly scalable spatiotemporal stochastic processes.
Both these processes can be used as "priors" for spatiotemporal random fields.
The first approach constructs a low-rank process operating on a
lower-dimensional subspace. The second approach constructs a Nearest-Neighbor
Gaussian Process (NNGP) that ensures sparse precision matrices for its finite
realizations. Both processes can be exploited as a scalable prior embedded
within a rich hierarchical modeling framework to deliver full Bayesian
inference. These approaches can be described as model-based solutions for big
spatiotemporal datasets. The models ensure that the algorithmic complexity has
floating point operations (flops), where the number of spatial
locations (per iteration). We compare these methods and provide some insight
into their methodological underpinnings
Network Psychometrics
This chapter provides a general introduction of network modeling in
psychometrics. The chapter starts with an introduction to the statistical model
formulation of pairwise Markov random fields (PMRF), followed by an
introduction of the PMRF suitable for binary data: the Ising model. The Ising
model is a model used in ferromagnetism to explain phase transitions in a field
of particles. Following the description of the Ising model in statistical
physics, the chapter continues to show that the Ising model is closely related
to models used in psychometrics. The Ising model can be shown to be equivalent
to certain kinds of logistic regression models, loglinear models and
multi-dimensional item response theory (MIRT) models. The equivalence between
the Ising model and the MIRT model puts standard psychometrics in a new light
and leads to a strikingly different interpretation of well-known latent
variable models. The chapter gives an overview of methods that can be used to
estimate the Ising model, and concludes with a discussion on the interpretation
of latent variables given the equivalence between the Ising model and MIRT.Comment: In Irwing, P., Hughes, D., and Booth, T. (2018). The Wiley Handbook
of Psychometric Testing, 2 Volume Set: A Multidisciplinary Reference on
Survey, Scale and Test Development. New York: Wile
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Localization for MCMC: sampling high-dimensional posterior distributions with local structure
We investigate how ideas from covariance localization in numerical weather
prediction can be used in Markov chain Monte Carlo (MCMC) sampling of
high-dimensional posterior distributions arising in Bayesian inverse problems.
To localize an inverse problem is to enforce an anticipated "local" structure
by (i) neglecting small off-diagonal elements of the prior precision and
covariance matrices; and (ii) restricting the influence of observations to
their neighborhood. For linear problems we can specify the conditions under
which posterior moments of the localized problem are close to those of the
original problem. We explain physical interpretations of our assumptions about
local structure and discuss the notion of high dimensionality in local
problems, which is different from the usual notion of high dimensionality in
function space MCMC. The Gibbs sampler is a natural choice of MCMC algorithm
for localized inverse problems and we demonstrate that its convergence rate is
independent of dimension for localized linear problems. Nonlinear problems can
also be tackled efficiently by localization and, as a simple illustration of
these ideas, we present a localized Metropolis-within-Gibbs sampler. Several
linear and nonlinear numerical examples illustrate localization in the context
of MCMC samplers for inverse problems.Comment: 33 pages, 5 figure
- …