261,426 research outputs found
Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data
Non-probability samples become increasingly popular in survey statistics but
may suffer from selection biases that limit the generalizability of results to
the target population. We consider integrating a non-probability sample with a
probability sample which provides high-dimensional representative covariate
information of the target population. We propose a two-step approach for
variable selection and finite population inference. In the first step, we use
penalized estimating equations with folded-concave penalties to select
important variables for the sampling score of selection into the
non-probability sample and the outcome model. We show that the penalized
estimating equation approach enjoys the selection consistency property for
general probability samples. The major technical hurdle is due to the possible
dependence of the sample under the finite population framework. To overcome
this challenge, we construct martingales which enable us to apply Bernstein
concentration inequality for martingales. In the second step, we focus on a
doubly robust estimator of the finite population mean and re-estimate the
nuisance model parameters by minimizing the asymptotic squared bias of the
doubly robust estimator. This estimating strategy mitigates the possible
first-step selection error and renders the doubly robust estimator root-n
consistent if either the sampling probability or the outcome model is correctly
specified
Marginal integration for nonparametric causal inference
We consider the problem of inferring the total causal effect of a single
variable intervention on a (response) variable of interest. We propose a
certain marginal integration regression technique for a very general class of
potentially nonlinear structural equation models (SEMs) with known structure,
or at least known superset of adjustment variables: we call the procedure
S-mint regression. We easily derive that it achieves the convergence rate as
for nonparametric regression: for example, single variable intervention effects
can be estimated with convergence rate assuming smoothness with
twice differentiable functions. Our result can also be seen as a major
robustness property with respect to model misspecification which goes much
beyond the notion of double robustness. Furthermore, when the structure of the
SEM is not known, we can estimate (the equivalence class of) the directed
acyclic graph corresponding to the SEM, and then proceed by using S-mint based
on these estimates. We empirically compare the S-mint regression method with
more classical approaches and argue that the former is indeed more robust, more
reliable and substantially simpler.Comment: 40 pages, 14 figure
A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming
This work presents the concept of kernel mean embedding and kernel
probabilistic programming in the context of stochastic systems. We propose
formulations to represent, compare, and propagate uncertainties for fairly
general stochastic dynamics in a distribution-free manner. The new tools enjoy
sound theory rooted in functional analysis and wide applicability as
demonstrated in distinct numerical examples. The implication of this new
concept is a new mode of thinking about the statistical nature of uncertainty
in dynamical systems
- …