61,994 research outputs found

    Nonparametric inverse probability weighted estimators based on the highly adaptive lasso

    Full text link
    Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse probability weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at n−1/3n^{-1/3}-rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse probability weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large-scale epidemiologic study

    Robust causal inference with continuous instruments using the local instrumental variable curve

    Full text link
    Instrumental variables are commonly used to estimate effects of a treatment afflicted by unmeasured confounding, and in practice instruments are often continuous (e.g., measures of distance, or treatment preference). However, available methods for continuous instruments have important limitations: they either require restrictive parametric assumptions for identification, or else rely on modeling both the outcome and treatment process well (and require modeling effect modification by all adjustment covariates). In this work we develop the first semiparametric doubly robust estimators of the local instrumental variable effect curve, i.e., the effect among those who would take treatment for instrument values above some threshold and not below. In addition to being robust to misspecification of either the instrument or treatment/outcome processes, our approach also incorporates information about the instrument mechanism and allows for flexible data-adaptive estimation of effect modification. We discuss asymptotic properties under weak conditions, and use the methods to study infant mortality effects of neonatal intensive care units with high versus low technical capacity, using travel time as an instrument

    Robust Learning of Fixed-Structure Bayesian Networks

    Full text link
    We investigate the problem of learning Bayesian networks in a robust model where an ϵ\epsilon-fraction of the samples are adversarially corrupted. In this work, we study the fully observable discrete case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples. Finally, we show on both synthetic and semi-synthetic data that our algorithm performs well in practice

    Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis

    Get PDF
    While estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure, through a mediator variable or not, that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.Comment: Published in at http://dx.doi.org/10.1214/12-AOS990 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    One-step Targeted Maximum Likelihood for Time-to-event Outcomes

    Full text link
    Current Targeted Maximum Likelihood Estimation (TMLE) methods used to analyze time-to-event data estimate the survival probability for each time point separately, which result in estimates that are not necessarily monotone. In this paper, we present an extension of TMLE for observational time-to-event data, the one-step Targeted Maximum Likelihood Estimator for the treatment-rule specific survival curve. We construct a one-dimensional universal least favorable submodel that targets the entire survival curve, and thereby requires minimal extra fitting with data to achieve its goal of solving the efficient influence curve equation. Through the use of a simulation study, we will show that this method improves on previously proposed methods in both robustness and efficiency, and at the same time respects the monotone decreasing nature of the survival curve

    Efficient Estimation of Quantiles in Missing Data Models

    Full text link
    We propose a novel targeted maximum likelihood estimator (TMLE) for quantiles in semiparametric missing data models. Our proposed estimator is locally efficient, n\sqrt{n}-consistent, asymptotically normal, and doubly robust, under regularity conditions. We use Monte Carlo simulation to compare our proposed method to existing estimators. The TMLE is superior to all competitors, with relative efficiency up to three times smaller than the inverse probability weighted estimator (IPW), and up to two times smaller than the augmented IPW. This research is motivated by a causal inference research question with highly variable treatment assignment probabilities, and a heavy tailed, highly variable outcome. Estimation of causal effects on the mean is a hard problem in such scenarios because the information bound is generally small. In our application, the efficiency bound for estimating the effect on the mean is possibly infinite. This rules out n\sqrt{n}-consistent inference and reduces the power for testing hypothesis of no treatment effect on the mean. In our simulations, using the effect on the median allows us to test a location-shift hypothesis with 30\% more power. This allows us to make claims about the effectiveness of treatment that would have hard to make for the effect on the mean. We provide R code to implement the proposed estimators

    Statistical Inference for Data-adaptive Doubly Robust Estimators with Survival Outcomes

    Full text link
    The consistency of doubly robust estimators relies on consistent estimation of at least one of two nuisance regression parameters. In moderate to large dimensions, the use of flexible data-adaptive regression estimators may aid in achieving this consistency. However, n1/2n^{1/2}-consistency of doubly robust estimators is not guaranteed if one of the nuisance estimators is inconsistent. In this paper we present a doubly robust estimator for survival analysis with the novel property that it converges to a Gaussian variable at n1/2n^{1/2}-rate for a large class of data-adaptive estimators of the nuisance parameters, under the only assumption that at least one of them is consistently estimated at a n1/4n^{1/4}-rate. This result is achieved through adaptation of recent ideas in semiparametric inference, which amount to: (i) Gaussianizing (i.e., making asymptotically linear) a drift term that arises in the asymptotic analysis of the doubly robust estimator, and (ii) using cross-fitting to avoid entropy conditions on the nuisance estimators. We present the formula of the asymptotic variance of the estimator, which allows computation of doubly robust confidence intervals and p-values. We illustrate the finite-sample properties of the estimator in simulation studies, and demonstrate its use in a phase III clinical trial for estimating the effect of a novel therapy for the treatment of HER2 positive breast cancer

    An informational approach to the global optimization of expensive-to-evaluate functions

    Full text link
    In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each evaluation contributes to the localization of good candidates for the role of global minimizer, a sequential choice of evaluation points is usually carried out. In particular, when Kriging is used to interpolate past evaluations, the uncertainty associated with the lack of information on the function can be expressed and used to compute a number of criteria accounting for the interest of an additional evaluation at any given point. This paper introduces minimizer entropy as a new Kriging-based criterion for the sequential choice of points at which the function should be evaluated. Based on \emph{stepwise uncertainty reduction}, it accounts for the informational gain on the minimizer expected from a new evaluation. The criterion is approximated using conditional simulations of the Gaussian process model behind Kriging, and then inserted into an algorithm similar in spirit to the \emph{Efficient Global Optimization} (EGO) algorithm. An empirical comparison is carried out between our criterion and \emph{expected improvement}, one of the reference criteria in the literature. Experimental results indicate major evaluation savings over EGO. Finally, the method, which we call IAGO (for Informational Approach to Global Optimization) is extended to robust optimization problems, where both the factors to be tuned and the function evaluations are corrupted by noise.Comment: Accepted for publication in the Journal of Global Optimization (This is the revised version, with additional details on computational problems, and some grammatical changes

    Robust Bayesian Regression with Synthetic Posterior

    Full text link
    Although linear regression models are fundamental tools in statistical science, the estimation results can be sensitive to outliers. While several robust methods have been proposed in frequentist frameworks, statistical inference is not necessarily straightforward. We here propose a Bayesian approach to robust inference on linear regression models using synthetic posterior distributions based on γ\gamma-divergence, which enables us to naturally assess the uncertainty of the estimation through the posterior distribution. We also consider the use of shrinkage priors for the regression coefficients to carry out robust Bayesian variable selection and estimation simultaneously. We develop an efficient posterior computation algorithm by adopting the Bayesian bootstrap within Gibbs sampling. The performance of the proposed method is illustrated through simulation studies and applications to famous datasets.Comment: 23 pages, 5 figure

    Entropy balancing is doubly robust

    Full text link
    Covariate balance is a conventional key diagnostic for methods used estimating causal effects from observational studies. Recently, there is an emerging interest in directly incorporating covariate balance in the estimation. We study a recently proposed entropy maximization method called Entropy Balancing (EB), which exactly matches the covariate moments for the different experimental groups in its optimization problem. We show EB is doubly robust with respect to linear outcome regression and logistic propensity score regression, and it reaches the asymptotic semiparametric variance bound when both regressions are correctly specified. This is surprising to us because there is no attempt to model the outcome or the treatment assignment in the original proposal of EB. Our theoretical results and simulations suggest that EB is a very appealing alternative to the conventional weighting estimators that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201
    • …
    corecore