256 research outputs found

    Identification and Efficient Estimation of the Natural Direct Effect Among the Untreated

    Get PDF
    The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this paper we introduce a new causal parameter called the natural direct effect among the untreated, discus identifiability assumptions, and show that this new parameter is equivalent to the NDE in a randomized control trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated

    Online Targeted Learning

    Get PDF
    We consider the case that the data comes in sequentially and can be viewed as sample of independent and identically distributed observations from a fixed data generating distribution. The goal is to estimate a particular path wise target parameter of this data generating distribution that is known to be an element of a particular semi-parametric statistical model. We want our estimator to be asymptotically efficient, but we also want that our estimator can be calculated by updating the current estimator based on the new block of data without having to revisit the past data, so that it is computationally much faster to compute than recomputing a fixed estimator each time new data comes in. We refer to such an estimator as an online estimator. These online estimators can also be applied on a large fixed data base by dividing the data set in many subsets and enforcing an ordering of these subsets. The current literature provides such online estimators for parametric models, where the online estimators are based on variations of the stochastic gradient descent algorithm. For that purpose we propose a new online one-step estimator, which is proven to be asymptotically efficient under regularity conditions. This estimator takes as input online estimators of the relevant part of the data generating distribution and the nuisance parameter that are required for efficient estimation of the target parameter. These estimators could be an online stochastic gradient descent estimator based on large parametric models as developed in the current literature, but we also propose other online data adaptive estimators that do not rely on the specification of a particular parametric model. We also present a targeted version of this online one-step estimator that presumably minimizes the one-step correction and thereby might be more robust in finite samples. These online one-step estimators are not a substitution estimator and might therefore be unstable for finite samples if the target parameter is borderline identifiable. Therefore we also develop an online targeted minimum loss-based estimator, which updates the initial estimator of the relevant part of the data generating distribution by updating the current initial estimator with the new block of data, and estimates the target parameter with the corresponding plug-in estimator. The online substitution estimator is also proven to be asymptotically efficient under the same regularity conditions required for asymptotic normality of the online one-step estimator. The online one-step estimator, targeted online one-step estimator, and online TMLE is demonstrated for estimation of a causal effect of a binary treatment on an outcome based on a dynamic data base that gets regularly updated, a common scenario for the analysis of electronic medical record data bases. Finally, we extend these online estimators to a group sequential adaptive design in which certain components of the data generating experiment are continuously fine-tuned based on past data, and the new data generating distribution is then used to generate the next block of data

    Balancing Score Adjusted Targeted Minimum Loss-based Estimation

    Get PDF
    Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with the balancing score property that is additionally locally efficient and doubly robust. We investigate the new estimator\u27s performance relative to other estimators, including another TMLE, a propensity score matching estimator, an inverse probability of treatment weighted estimator, and a regression based estimator in simulation studies

    Online Cross-Validation-Based Ensemble Learning

    Get PDF
    Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate the practical performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database

    Forschung und Soziale Arbeit zu Queer mit Rassismuserfahrungen

    Get PDF

    Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods

    Get PDF
    The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a library of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different databases. While the SL has been shown to perform well in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of the SL in its ability to predict treatment assignment using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also considered a novel strategy for prediction modeling that combines the SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases

    Group Testing for Case Identification with Correlated Responses

    Get PDF
    This article examines group testing procedures where units within a group (or pool) may be correlated. The expected number of tests per unit (i.e., efficiency) of hierarchical- and matrix-based procedures is derived based on a class of models of exchangeable binary random variables. The effect on efficiency of the arrangement of correlated units within pools is then examined. In general, when correlated units are arranged in the same pool, the expected number of tests per unit decreases, sometimes substantially, relative to arrangements that ignore information about correlation

    ltmle: An R Package Implementing Targeted Minimum Loss-Based Estimation for Longitudinal Data

    Get PDF
    In recent years, targeted minimum loss-based estimation methodology has been used to develop estimators of parameters in longitudinal data structures (Gruber and van der Laan 2012; Petersen, Schwab, Gruber, Blaser, Schomaker, and van der Laan 2014; Schnitzer, Moodie, van der Laan, Platt, and Klein 2013). These methods are implemented in the ltmle package for R. The ltmle package provides methods to estimate intervention-specific means and measures of association including the average treatment effect, causal odds ratio and causal risk ratio and parameters of a longitudinal working marginal structural model. The package allows for multiple time point treatments, time-varying covariates and right censoring of the outcome. In this paper we described the usage of the ltmle package and provide examples

    The Effect of Cross-Border E-Commerce on China’s International Trade: An Empirical Study Based on Transaction Cost Analysis

    Get PDF
    Reducing transaction costs by means of policy intervention could generate comparative advantages and contribute to the growth of international trade. Chinese government agencies have introduced a number of policies in support of rapidly growing cross-border e-commerce to promote China’s international trade. However, the previous literature has not empirically verified the precise effect of these policies on the growth of international trade while focusing on the impact of cross-border e-commerce on trade distance and consumer welfare. To address this gap, this paper investigates the impact of cross-border e-commerce on international trade in the context of China, mainly from the perspective of transaction cost economics in conjunction with the traditional comparative advantage model by analyzing information cost, negotiation cost, transportation cost, tariffs and middlemen cost separately. Firstly, the new theoretical model suggests that cross-border e-commerce may have a positive role in promoting international trade only when the negative impact caused by tariff cost and transportation cost is offset. Secondly, our result shows that cross-border e-commerce has a positive effect on the growth of China’s international trade in each year. However, the positive effect does not show incremental growth over time, possibly as a result of the weak implementation of favorable policies in trade, in addition to global trade shrinking
    • …
    corecore