172 research outputs found
Corrupted Sensing: Novel Guarantees for Separating Structured Signals
We study the problem of corrupted sensing, a generalization of compressed
sensing in which one aims to recover a signal from a collection of corrupted or
unreliable measurements. While an arbitrary signal cannot be recovered in the
face of arbitrary corruption, tractable recovery is possible when both signal
and corruption are suitably structured. We quantify the relationship between
signal recovery and two geometric measures of structure, the Gaussian
complexity of a tangent cone and the Gaussian distance to a subdifferential. We
take a convex programming approach to disentangling signal and corruption,
analyzing both penalized programs that trade off between signal and corruption
complexity, and constrained programs that bound the complexity of signal or
corruption when prior information is available. In each case, we provide
conditions for exact signal recovery from structured corruption and stable
signal recovery from structured corruption with added unstructured noise. Our
simulations demonstrate close agreement between our theoretical recovery bounds
and the sharp phase transitions observed in practice. In addition, we provide
new interpretable bounds for the Gaussian complexity of sparse vectors,
block-sparse vectors, and low-rank matrices, which lead to sharper guarantees
of recovery when combined with our results and those in the literature.Comment: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=671204
Multivariate Stein Factors for a Class of Strongly Log-concave Distributions
We establish uniform bounds on the low-order derivatives of Stein equation
solutions for a broad class of multivariate, strongly log-concave target
distributions. These "Stein factor" bounds deliver control over Wasserstein and
related smooth function distances and are well-suited to analyzing the
computable Stein discrepancy measures of Gorham and Mackey. Our arguments of
proof are probabilistic and feature the synchronous coupling of multiple
overdamped Langevin diffusions.Comment: 14 pages. The strong continuity argument in an earlier version did
not identify an appropriate Banach space; this version does so. arXiv admin
note: substantial text overlap with arXiv:1506.0303
Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences
Copy number variations in cancer cells and volatility fluctuations in stock
prices are commonly manifested as changepoints occurring at the same positions
across related data sequences. We introduce a Bayesian modeling framework,
BASIC, that employs a changepoint prior to capture the co-occurrence tendency
in data of this type. We design efficient algorithms to sample from and
maximize over the BASIC changepoint posterior and develop a Monte Carlo
expectation-maximization procedure to select prior hyperparameters in an
empirical Bayes fashion. We use the resulting BASIC framework to analyze DNA
copy number variations in the NCI-60 cancer cell lines and to identify
important events that affected the price volatility of S&P 500 stocks from 2000
to 2009.Comment: 31 pages, 11 figures v3: Modify synthetic data comparisons based on
reviewer feedbac
Measuring Sample Quality with Stein's Method
To improve the efficiency of Monte Carlo estimation, practitioners are
turning to biased Markov chain Monte Carlo procedures that trade off asymptotic
exactness for computational speed. The reasoning is sound: a reduction in
variance due to more rapid sampling can outweigh the bias introduced. However,
the inexactness creates new challenges for sampler and parameter selection,
since standard measures of sample quality like effective sample size do not
account for asymptotic bias. To address these challenges, we introduce a new
computable quality measure based on Stein's method that quantifies the maximum
discrepancy between sample and target expectations over a large class of test
functions. We use our tool to compare exact, biased, and deterministic sample
sequences and illustrate applications to hyperparameter selection, convergence
rate assessment, and quantifying bias-variance tradeoffs in posterior
inference.Comment: 17 pages, 6 figure
Improving Gibbs Sampler Scan Quality with DoGS
The pairwise influence matrix of Dobrushin has long been used as an
analytical tool to bound the rate of convergence of Gibbs sampling. In this
work, we use Dobrushin influence as the basis of a practical tool to certify
and efficiently improve the quality of a discrete Gibbs sampler. Our
Dobrushin-optimized Gibbs samplers (DoGS) offer customized variable selection
orders for a given sampling budget and variable subset of interest, explicit
bounds on total variation distance to stationarity, and certifiable
improvements over the standard systematic and uniform random scan Gibbs
samplers. In our experiments with joint image segmentation and object
recognition, Markov chain Monte Carlo maximum likelihood estimation, and Ising
model inference, DoGS consistently deliver higher-quality inferences with
significantly smaller sampling budgets than standard Gibbs samplers.Comment: ICML 201
Measuring Sample Quality with Kernels
Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid
sampling at the cost of more biased inference. Since standard MCMC diagnostics
fail to detect these biases, researchers have developed computable Stein
discrepancy measures that provably determine the convergence of a sample to its
target distribution. This approach was recently combined with the theory of
reproducing kernels to define a closed-form kernel Stein discrepancy (KSD)
computable by summing kernel evaluations across pairs of sample points. We
develop a theory of weak convergence for KSDs based on Stein's method,
demonstrate that commonly used KSDs fail to detect non-convergence even for
Gaussian targets, and show that kernels with slowly decaying tails provably
determine convergence for a large class of target distributions. The resulting
convergence-determining KSDs are suitable for comparing biased, exact, and
deterministic sample sequences and simpler to compute and parallelize than
alternative Stein discrepancies. We use our tools to compare biased samplers,
select sampler hyperparameters, and improve upon existing KSD approaches to
one-sample hypothesis testing and sample quality improvement
Random Feature Stein Discrepancies
Computable Stein discrepancies have been deployed for a variety of
applications, ranging from sampler selection in posterior inference to
approximate Bayesian inference to goodness-of-fit testing. Existing
convergence-determining Stein discrepancies admit strong theoretical guarantees
but suffer from a computational cost that grows quadratically in the sample
size. While linear-time Stein discrepancies have been proposed for
goodness-of-fit testing, they exhibit avoidable degradations in testing
power---even when power is explicitly optimized. To address these shortcomings,
we introduce feature Stein discrepancies (SDs), a new family of quality
measures that can be cheaply approximated using importance sampling. We show
how to construct SDs that provably determine the convergence of a sample
to its target and develop high-accuracy approximations---random SDs
(RSDs)---which are computable in near-linear time. In our experiments
with sampler selection for approximate posterior inference and goodness-of-fit
testing, RSDs perform as well or better than quadratic-time KSDs while
being orders of magnitude faster to compute.Comment: In Proceedings of the 32nd Annual Conference on Neural Information
Processing Systems (NeurIPS 2018). Code available at:
https://bitbucket.org/jhhuggins/random-feature-stein-discrepancie
Orthogonal Machine Learning: Power and Limitations
Double machine learning provides -consistent estimates of
parameters of interest even when high-dimensional or nonparametric nuisance
parameters are estimated at an rate. The key is to employ
Neyman-orthogonal moment equations which are first-order insensitive to
perturbations in the nuisance parameters. We show that the
requirement can be improved to by employing a -th order
notion of orthogonality that grants robustness to more complex or
higher-dimensional nuisance parameters. In the partially linear regression
setting popular in causal inference, we show that we can construct second-order
orthogonal moments if and only if the treatment residual is not normally
distributed. Our proof relies on Stein's lemma and may be of independent
interest. We conclude by demonstrating the robustness benefits of an explicit
doubly-orthogonal estimation procedure for treatment effect
Efron-Stein Inequalities for Random Matrices
This paper establishes new concentration inequalities for random matrices
constructed from independent random variables. These results are analogous with
the generalized Efron-Stein inequalities developed by Boucheron et al. The
proofs rely on the method of exchangeable pairs.Comment: 42 pages. arXiv admin note: text overlap with arXiv:1305.061
Global Non-convex Optimization with Discretized Diffusions
An Euler discretization of the Langevin diffusion is known to converge to the
global minimizers of certain convex and non-convex optimization problems. We
show that this property holds for any suitably smooth diffusion and that
different diffusions are suitable for optimizing different classes of convex
and non-convex functions. This allows us to design diffusions suitable for
globally optimizing convex and non-convex functions not covered by the existing
Langevin theory. Our non-asymptotic analysis delivers computable optimization
and integration error bounds based on easily accessed properties of the
objective and chosen diffusion. Central to our approach are new explicit Stein
factor bounds on the solutions of Poisson equations. We complement these
results with improved optimization guarantees for targets other than the
standard Gibbs measure.Comment: 19 pages, NeurIPS 2018 camera ready versio
- …