520 research outputs found
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
We consider challenges that arise in the estimation of the mean outcome under
an optimal individualized treatment strategy defined as the treatment rule that
maximizes the population mean outcome, where the candidate treatment rules are
restricted to depend on baseline covariates. We prove a necessary and
sufficient condition for the pathwise differentiability of the optimal value, a
key condition needed to develop a regular and asymptotically linear (RAL)
estimator of the optimal value. The stated condition is slightly more general
than the previous condition implied in the literature. We then describe an
approach to obtain root- rate confidence intervals for the optimal value
even when the parameter is not pathwise differentiable. We provide conditions
under which our estimator is RAL and asymptotically efficient when the mean
outcome is pathwise differentiable. We also outline an extension of our
approach to a multiple time point problem. All of our results are supported by
simulations.Comment: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Evaluating the Impact of Treating the Optimal Subgroup
Suppose we have a binary treatment used to influence an outcome. Given data
from an observational or controlled study, we wish to determine whether or not
there exists some subset of observed covariates in which the treatment is more
effective than the standard practice of no treatment. Furthermore, we wish to
quantify the improvement in population mean outcome that will be seen if this
subgroup receives treatment and the rest of the population remains untreated.
We show that this problem is surprisingly challenging given how often it is an
(at least implicit) study objective. Blindly applying standard techniques fails
to yield any apparent asymptotic results, while using existing techniques to
confront the non-regularity does not necessarily help at distributions where
there is no treatment effect. Here we describe an approach to estimate the
impact of treating the subgroup which benefits from treatment that is valid in
a nonparametric model and is able to deal with the case where there is no
treatment effect. The approach is a slight modification of an approach that
recently appeared in the individualized medicine literature
Super-Learning of an Optimal Dynamic Treatment Rule
We consider the estimation of an optimal dynamic two time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to depend only on a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric, beyond possible knowledge about the treatment and censoring mechanisms. We propose data adaptive estimators of this optimal dynamic regime which are defined by sequential loss-based learning under both the blip function and weighted classification frameworks. Rather than \textit{a priori} selecting an estimation framework and algorithm, we propose combining estimators from both frameworks using a super-learning based cross-validation selector that seeks to minimize an appropriate cross-validated risk. One of the proposed risks directly measures the performance of the mean outcome under the optimal rule. The resulting selector is guaranteed to asymptotically perform as well as the best convex combination of candidate algorithms in terms of loss-based dissimilarity under conditions. We offer simulation results to support our theoretical findings. This work expands upon that of an earlier technical report (van der Laan, 2013) with new results and simulations, and is accompanied by a work which develops inference for the mean outcome under the optimal rule (van der Laan and Luedtke, 2014)
Targeted Learning of the Mean Outcome Under an Optimal Dynamic Treatment Rule
We consider estimation of and inference for the mean outcome under the optimal dynamic two time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to depend only on a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism. This contrasts from the current literature that relies on parametric assumptions. We establish that the mean of the counterfactual outcome under the optimal dynamic treatment is a pathwise differentiable parameter under conditions, and develop a targeted minimum loss-based estimator (TMLE) of this target parameter. We establish asymptotic linearity and statistical inference for this estimator under specified conditions. In a sequentially randomized trial the statistical inference relies upon a second order difference between the estimator of the optimal dynamic treatment and the optimal dynamic treatment to be asymptotically negligible, which may be a problematic condition when the rule is based on multivariate time-dependent covariates. To avoid this condition, we also develop targeted minimum loss based estimators and statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the estimate of the optimal dynamic treatment. In particular, we develop a novel cross-validated TMLE approach that provides asymptotic inference under minimal conditions, avoiding the need for any empirical process conditions. We offer simulation results to support our theoretical findings. This work expands upon that of an earlier technical report (van der Laan, 2013; van der Laan and Luedtke, 2014) with new results and simulations, and is accompanied by a work which explores the estimation of the optimal rule (Luedtke and van der Laan, 2014)
Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy
We consider challenges that arise in the estimation of the value of an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular asymptotically linear (RAL) estimator of this parameter. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-n rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. In particular, we develop an estimator that, when properly standardized, converges to a normal limiting distribution. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We outline an extension of our approach to a multiple time point problem in the appendix. All of our results are supported by simulations
Targeted Learning of an Optimal Dynamic Treatment, and Statistical Inference for its Mean Outcome
Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to only respond to a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism, while still providing statistical inference for the mean outcome under the optimal rule. This contrasts from the current literature that relies on parametric assumptions. For the sake of presentation, we first consider the case that the treatment/censoring is only assigned at a single time-point, and subsequently, we cover the multiple time-point case. We characterize the optimal dynamic treatment as a statistical target parameter in the nonparametric statistical model, and we propose highly data adaptive estimators of this optimal dynamic regimen, utilizing sequential loss-based super-learning of sequentially defined (so called) blip-functions, based on newly proposed loss-functions. We also propose a cross-validation selector (among candidate estimators of the optimal dynamic regimens) based on a cross-validated targeted minimum loss-based estimator of the mean outcome under the candidate regimen, thereby aiming directly to select the candidate estimator that maximizes the mean outcome. We also establish that the mean of the counterfactual outcome under the optimal dynamic treatment is a pathwise differentiable parameter under assumptions, and develop a targeted minimum loss-based estimator (TMLE) of this target parameter. We establish asymptotic linearity and statistical inference based on this targeted minimum loss-based estimator under specified conditions. In a sequentially randomized trial the statistical inference essentially only relies upon a second order difference between the estimator of the optimal dynamic treatment and the optimal dynamic treatment to be asymptotically negligible, which may be a problematic condition when the rule is based on multivariate time-dependent covariates. To avoid this condition, we also develop targeted minimum loss based estimators and statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the {\em estimate} of the optimal dynamic treatment. In particular, we develop a novel cross-validated TMLE approach that provides asymptotic inference under minimal conditions, avoiding the need for any empirical process conditions. For the sake of presentation, in the main part of the article we focus on two-time point interventions, but the results are generalized to general multiple time point interventions in the appendix
An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions
We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely
Computerizing Efficient Estimation of a Pathwise Differentiable Target Parameter
Frangakis et al. (2015) proposed a numerical method for computing the efficient influence function of a parameter in a nonparametric model at a specified distribution and observation (provided such an influence function exists). Their approach is based on the assumption that the efficient influence function is given by the directional derivative of the target parameter mapping in the direction of a perturbation of the data distribution defined as the convex line from the data distribution to a pointmass at the observation. In our discussion paper Luedtke et al. (2015) we propose a regularization of this procedure and establish the validity of this method in great generality. In this article we propose a generalization of the latter regularized numerical delta method for computing the efficient influence function for general statistical models, and formally establish its validity under appropriate regularity conditions. Our proposed method consists of applying the regularized numerical delta-method for nonparametrically-defined target parameters proposed in Luedtke et al. 2015 to the nonparametrically-defined maximum likelihood mapping that maps a data distribution (normally the empirical distribution) into its Kullback-Leibler projection onto the model. This method formalizes the notion that an algorithm for computing a maximum likelihood estimator also yields an algorithm for computing the efficient influence function at a user-supplied data distribution. We generalize this method to a minimum loss-based mapping. We also show how the method extends to compute the higher-order efficient influence function at an observation pair for higher-order pathwise differentiable target parameters. Finally, we propose a new method for computing the efficient influence function as a whole curve by applying the maximum likelihood mapping to a perturbation of the data distribution with score equal to an initial gradient of the pathwise derivative. We demonstrate each method with a variety of examples
- …