538 research outputs found
Generalized Pseudolikelihood Methods for Inverse Covariance Estimation
We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse
covariance matrix, that has a number of useful statistical and computational
properties. We show, through detailed experiments with synthetic and also
real-world finance as well as wind power data, that PseudoNet outperforms
related methods in terms of estimation error and support recovery, making it
well-suited for use in a downstream application, where obtaining low estimation
error can be important. We also show, under regularity conditions, that
PseudoNet is consistent. Our proof assumes the existence of accurate estimates
of the diagonal entries of the underlying inverse covariance matrix; we
additionally provide a two-step method to obtain these estimates, even in a
high-dimensional setting, going beyond the proofs for related methods. Unlike
other pseudolikelihood-based methods, we also show that PseudoNet does not
saturate, i.e., in high dimensions, there is no hard limit on the number of
nonzero entries in the PseudoNet estimate. We present a fast algorithm as well
as screening rules that make computing the PseudoNet estimate over a range of
tuning parameters tractable
Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods
Models with intractable likelihood functions arise in areas including network
analysis and spatial statistics, especially those involving Gibbs random
fields. Posterior parameter es timation in these settings is termed a
doubly-intractable problem because both the likelihood function and the
posterior distribution are intractable. The comparison of Bayesian models is
often based on the statistical evidence, the integral of the un-normalised
posterior distribution over the model parameters which is rarely available in
closed form. For doubly-intractable models, estimating the evidence adds
another layer of difficulty. Consequently, the selection of the model that best
describes an observed network among a collection of exponential random graph
models for network analysis is a daunting task. Pseudolikelihoods offer a
tractable approximation to the likelihood but should be treated with caution
because they can lead to an unreasonable inference. This paper specifies a
method to adjust pseudolikelihoods in order to obtain a reasonable, yet
tractable, approximation to the likelihood. This allows implementation of
widely used computational methods for evidence estimation and pursuit of
Bayesian model selection of exponential random graph models for the analysis of
social networks. Empirical comparisons to existing methods show that our
procedure yields similar evidence estimates, but at a lower computational cost.Comment: Supplementary material attached. To view attachments, please download
and extract the gzzipped source file listed under "Other formats
Residuals and goodness-of-fit tests for stationary marked Gibbs point processes
The inspection of residuals is a fundamental step to investigate the quality
of adjustment of a parametric model to data. For spatial point processes, the
concept of residuals has been recently proposed by Baddeley et al. (2005) as an
empirical counterpart of the {\it Campbell equilibrium} equation for marked
Gibbs point processes. The present paper focuses on stationary marked Gibbs
point processes and deals with asymptotic properties of residuals for such
processes. In particular, the consistency and the asymptotic normality are
obtained for a wide class of residuals including the classical ones (raw
residuals, inverse residuals, Pearson residuals). Based on these asymptotic
results, we define goodness-of-fit tests with Type-I error theoretically
controlled. One of these tests constitutes an extension of the quadrat counting
test widely used to test the null hypothesis of a homogeneous Poisson point
process
Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation
Across a variety of scientific disciplines, sparse inverse covariance
estimation is a popular tool for capturing the underlying dependency
relationships in multivariate data. Unfortunately, most estimators are not
scalable enough to handle the sizes of modern high-dimensional data sets (often
on the order of terabytes), and assume Gaussian samples. To address these
deficiencies, we introduce HP-CONCORD, a highly scalable optimization method
for estimating a sparse inverse covariance matrix based on a regularized
pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal
gradient method uses a novel communication-avoiding linear algebra algorithm
and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving
parallel scalability on problems with up to ~819 billion parameters (1.28
million dimensions); even on a single node, HP-CONCORD demonstrates
scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to
estimate the underlying dependency structure of the brain from fMRI data, and
use the result to identify functional regions automatically. The results show
good agreement with a clustering from the neuroscience literature.Comment: Main paper: 15 pages, appendix: 24 page
- …