61,994 research outputs found
Nonparametric inverse probability weighted estimators based on the highly adaptive lasso
Inverse probability weighted estimators are the oldest and potentially most
commonly used class of procedures for the estimation of causal effects. By
adjusting for selection biases via a weighting mechanism, these procedures
estimate an effect of interest by constructing a pseudo-population in which
selection biases are eliminated. Despite their ease of use, these estimators
require the correct specification of a model for the weighting mechanism, are
known to be inefficient, and suffer from the curse of dimensionality. We
propose a class of nonparametric inverse probability weighted estimators in
which the weighting mechanism is estimated via undersmoothing of the highly
adaptive lasso, a nonparametric regression function proven to converge at
-rate to the true weighting mechanism. We demonstrate that our
estimators are asymptotically linear with variance converging to the
nonparametric efficiency bound. Unlike doubly robust estimators, our procedures
require neither derivation of the efficient influence function nor
specification of the conditional outcome model. Our theoretical developments
have broad implications for the construction of efficient inverse probability
weighted estimators in large statistical models and a variety of problem
settings. We assess the practical performance of our estimators in simulation
studies and demonstrate use of our proposed methodology with data from a
large-scale epidemiologic study
Robust causal inference with continuous instruments using the local instrumental variable curve
Instrumental variables are commonly used to estimate effects of a treatment
afflicted by unmeasured confounding, and in practice instruments are often
continuous (e.g., measures of distance, or treatment preference). However,
available methods for continuous instruments have important limitations: they
either require restrictive parametric assumptions for identification, or else
rely on modeling both the outcome and treatment process well (and require
modeling effect modification by all adjustment covariates). In this work we
develop the first semiparametric doubly robust estimators of the local
instrumental variable effect curve, i.e., the effect among those who would take
treatment for instrument values above some threshold and not below. In addition
to being robust to misspecification of either the instrument or
treatment/outcome processes, our approach also incorporates information about
the instrument mechanism and allows for flexible data-adaptive estimation of
effect modification. We discuss asymptotic properties under weak conditions,
and use the methods to study infant mortality effects of neonatal intensive
care units with high versus low technical capacity, using travel time as an
instrument
Robust Learning of Fixed-Structure Bayesian Networks
We investigate the problem of learning Bayesian networks in a robust model
where an -fraction of the samples are adversarially corrupted. In
this work, we study the fully observable discrete case where the structure of
the network is given. Even in this basic setting, previous learning algorithms
either run in exponential time or lose dimension-dependent factors in their
error guarantees. We provide the first computationally efficient robust
learning algorithm for this problem with dimension-independent error
guarantees. Our algorithm has near-optimal sample complexity, runs in
polynomial time, and achieves error that scales nearly-linearly with the
fraction of adversarially corrupted samples. Finally, we show on both synthetic
and semi-synthetic data that our algorithm performs well in practice
Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis
While estimation of the marginal (total) causal effect of a point exposure on
an outcome is arguably the most common objective of experimental and
observational studies in the health and social sciences, in recent years,
investigators have also become increasingly interested in mediation analysis.
Specifically, upon evaluating the total effect of the exposure, investigators
routinely wish to make inferences about the direct or indirect pathways of the
effect of the exposure, through a mediator variable or not, that occurs
subsequently to the exposure and prior to the outcome. Although powerful
semiparametric methodologies have been developed to analyze observational
studies that produce double robust and highly efficient estimates of the
marginal total causal effect, similar methods for mediation analysis are
currently lacking. Thus, this paper develops a general semiparametric framework
for obtaining inferences about so-called marginal natural direct and indirect
causal effects, while appropriately accounting for a large number of
pre-exposure confounding factors for the exposure and the mediator variables.
Our analytic framework is particularly appealing, because it gives new insights
on issues of efficiency and robustness in the context of mediation analysis. In
particular, we propose new multiply robust locally efficient estimators of the
marginal natural indirect and direct causal effects, and develop a novel double
robust sensitivity analysis framework for the assumption of ignorability of the
mediator variable.Comment: Published in at http://dx.doi.org/10.1214/12-AOS990 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
One-step Targeted Maximum Likelihood for Time-to-event Outcomes
Current Targeted Maximum Likelihood Estimation (TMLE) methods used to analyze
time-to-event data estimate the survival probability for each time point
separately, which result in estimates that are not necessarily monotone. In
this paper, we present an extension of TMLE for observational time-to-event
data, the one-step Targeted Maximum Likelihood Estimator for the treatment-rule
specific survival curve. We construct a one-dimensional universal least
favorable submodel that targets the entire survival curve, and thereby requires
minimal extra fitting with data to achieve its goal of solving the efficient
influence curve equation. Through the use of a simulation study, we will show
that this method improves on previously proposed methods in both robustness and
efficiency, and at the same time respects the monotone decreasing nature of the
survival curve
Efficient Estimation of Quantiles in Missing Data Models
We propose a novel targeted maximum likelihood estimator (TMLE) for quantiles
in semiparametric missing data models. Our proposed estimator is locally
efficient, -consistent, asymptotically normal, and doubly robust,
under regularity conditions. We use Monte Carlo simulation to compare our
proposed method to existing estimators. The TMLE is superior to all
competitors, with relative efficiency up to three times smaller than the
inverse probability weighted estimator (IPW), and up to two times smaller than
the augmented IPW. This research is motivated by a causal inference research
question with highly variable treatment assignment probabilities, and a heavy
tailed, highly variable outcome. Estimation of causal effects on the mean is a
hard problem in such scenarios because the information bound is generally
small. In our application, the efficiency bound for estimating the effect on
the mean is possibly infinite. This rules out -consistent inference
and reduces the power for testing hypothesis of no treatment effect on the
mean. In our simulations, using the effect on the median allows us to test a
location-shift hypothesis with 30\% more power. This allows us to make claims
about the effectiveness of treatment that would have hard to make for the
effect on the mean. We provide R code to implement the proposed estimators
Statistical Inference for Data-adaptive Doubly Robust Estimators with Survival Outcomes
The consistency of doubly robust estimators relies on consistent estimation
of at least one of two nuisance regression parameters. In moderate to large
dimensions, the use of flexible data-adaptive regression estimators may aid in
achieving this consistency. However, -consistency of doubly robust
estimators is not guaranteed if one of the nuisance estimators is inconsistent.
In this paper we present a doubly robust estimator for survival analysis with
the novel property that it converges to a Gaussian variable at -rate
for a large class of data-adaptive estimators of the nuisance parameters, under
the only assumption that at least one of them is consistently estimated at a
-rate. This result is achieved through adaptation of recent ideas in
semiparametric inference, which amount to: (i) Gaussianizing (i.e., making
asymptotically linear) a drift term that arises in the asymptotic analysis of
the doubly robust estimator, and (ii) using cross-fitting to avoid entropy
conditions on the nuisance estimators. We present the formula of the asymptotic
variance of the estimator, which allows computation of doubly robust confidence
intervals and p-values. We illustrate the finite-sample properties of the
estimator in simulation studies, and demonstrate its use in a phase III
clinical trial for estimating the effect of a novel therapy for the treatment
of HER2 positive breast cancer
An informational approach to the global optimization of expensive-to-evaluate functions
In many global optimization problems motivated by engineering applications,
the number of function evaluations is severely limited by time or cost. To
ensure that each evaluation contributes to the localization of good candidates
for the role of global minimizer, a sequential choice of evaluation points is
usually carried out. In particular, when Kriging is used to interpolate past
evaluations, the uncertainty associated with the lack of information on the
function can be expressed and used to compute a number of criteria accounting
for the interest of an additional evaluation at any given point. This paper
introduces minimizer entropy as a new Kriging-based criterion for the
sequential choice of points at which the function should be evaluated. Based on
\emph{stepwise uncertainty reduction}, it accounts for the informational gain
on the minimizer expected from a new evaluation. The criterion is approximated
using conditional simulations of the Gaussian process model behind Kriging, and
then inserted into an algorithm similar in spirit to the \emph{Efficient Global
Optimization} (EGO) algorithm. An empirical comparison is carried out between
our criterion and \emph{expected improvement}, one of the reference criteria in
the literature. Experimental results indicate major evaluation savings over
EGO. Finally, the method, which we call IAGO (for Informational Approach to
Global Optimization) is extended to robust optimization problems, where both
the factors to be tuned and the function evaluations are corrupted by noise.Comment: Accepted for publication in the Journal of Global Optimization (This
is the revised version, with additional details on computational problems,
and some grammatical changes
Robust Bayesian Regression with Synthetic Posterior
Although linear regression models are fundamental tools in statistical
science, the estimation results can be sensitive to outliers. While several
robust methods have been proposed in frequentist frameworks, statistical
inference is not necessarily straightforward. We here propose a Bayesian
approach to robust inference on linear regression models using synthetic
posterior distributions based on -divergence, which enables us to
naturally assess the uncertainty of the estimation through the posterior
distribution. We also consider the use of shrinkage priors for the regression
coefficients to carry out robust Bayesian variable selection and estimation
simultaneously. We develop an efficient posterior computation algorithm by
adopting the Bayesian bootstrap within Gibbs sampling. The performance of the
proposed method is illustrated through simulation studies and applications to
famous datasets.Comment: 23 pages, 5 figure
Entropy balancing is doubly robust
Covariate balance is a conventional key diagnostic for methods used
estimating causal effects from observational studies. Recently, there is an
emerging interest in directly incorporating covariate balance in the
estimation. We study a recently proposed entropy maximization method called
Entropy Balancing (EB), which exactly matches the covariate moments for the
different experimental groups in its optimization problem. We show EB is doubly
robust with respect to linear outcome regression and logistic propensity score
regression, and it reaches the asymptotic semiparametric variance bound when
both regressions are correctly specified. This is surprising to us because
there is no attempt to model the outcome or the treatment assignment in the
original proposal of EB. Our theoretical results and simulations suggest that
EB is a very appealing alternative to the conventional weighting estimators
that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201
- …