    Matching the Efficiency Gains of the Logistic Regression Estimator While Avoiding its Interpretability Problems, in Randomized Trials

    Adjusting for prognostic baseline variables can lead to improved power in randomized trials. For binary outcomes, a logistic regression estimator is commonly used for such adjustment. This has resulted in substantial efficiency gains in practice, e.g., gains equivalent to reducing the required sample size by 20-28% were observed in a recent survey of traumatic brain injury trials. Robinson and Jewell (1991) proved that the logistic regression estimator is guaranteed to have equal or better asymptotic efficiency compared to the unadjusted estimator (which ignores baseline variables). Unfortunately, the logistic regression estimator has the following dangerous vulnerabilities: it is only interpretable when the treatment effect is identical within every stratum of baseline covariates; also, it is inconsistent under model misspecification, which is virtually guaranteed when the baseline covariates are continuous or categorical with many levels. An open problem was whether there exists an equally powerful, covariate-adjusted estimator with no such vulnerabilities, i.e., one that (i) is interpretable and consistent without requiring any model assumptions, and (ii) matches the efficiency gains of the logistic regression estimator. Such an estimator would provide the best of both worlds: interpretability and consistency under no model assumptions (like the unadjusted estimator) and power gains from covariate adjustment (that match the logistic regression estimator). We prove a new asymptotic result showing that, surprisingly, there are simple estimators satisfying the above properties. We argue that these rarely used estimators have substantial advantages over the more commonly used logistic regression estimator for covariate adjustment in randomized trials with binary outcomes. Though our focus is binary outcomes and logistic regression models, our results extend to a large class of generalized linear models

    Censoring Unbiased Regression Trees and Ensembles

    This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g., CART, random forests) by making use of a related form of response imputation. Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees


    In randomized clinical trials with baseline variables that are prognostic for the primary outcome, there is potential to improve precision and reduce sample size by appropriately adjusting for these variables. A major challenge is that there are multiple statistical methods to adjust for baseline variables, but little guidance on which is best to use in a given context. The choice of method can have important consequences. For example, one commonly used method leads to uninterpretable estimates if there is any treatment effect heterogeneity, which would jeopardize the validity of trial conclusions. We give practical guidance on how to avoid this problem, while retaining the advantages of covariate adjustment. This can be achieved by using simple (but less well-known) standardization methods from the recent statistics literature. We discuss these methods and give software in R and Stata implementing them. A data example from a recent stroke trial is used to illustrate these methods

    Efficient estimation of subgroup treatment effects using multi-source data

    Investigators often use multi-source data (e.g., multi-center trials, meta-analyses of randomized trials, pooled analyses of observational cohorts) to learn about the effects of interventions in subgroups of some well-defined target population. Such a target population can correspond to one of the data sources of the multi-source data or an external population in which the treatment and outcome information may not be available. We develop and evaluate methods for using multi-source data to estimate subgroup potential outcome means and treatment effects in a target population. We consider identifiability conditions and propose doubly robust estimators that, under mild conditions, are non-parametrically efficient and allow for nuisance functions to be estimated using flexible data-adaptive methods (e.g., machine learning techniques). We also show how to construct confidence intervals and simultaneous confidence bands for the estimated subgroup treatment effects. We examine the properties of the proposed estimators in simulation studies and compare performance against alternative estimators. We also conclude that our methods work well when the sample size of the target population is much larger than the sample size of the multi-source data. We illustrate the proposed methods in a meta-analysis of randomized trials for schizophrenia

    Assessing model performance for counterfactual predictions

    Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals. Here, we discuss how to tailor a model to a counterfactual estimand, how to assess the model's performance, and how to perform model and tuning parameter selection. We also provide identifiability results for measures of performance for a potentially misspecified counterfactual prediction model based on training and test data from the same (factual) source population. Last, we illustrate the methods using simulation and apply them to the task of developing a statin-na\"{i}ve risk prediction model for cardiovascular disease


    Adaptive enrichment designs involve preplanned rules for modifying patient enrollment criteria based on data accrued in an ongoing trial. These designs may be useful when it is suspected that a subpopulation, e.g., defined by a biomarker or risk score measured at baseline, may benefit more from treatment than the complementary subpopulation. We compare two types of such designs, for the case of two subpopulations that partition the overall population. The first type starts by enrolling the subpopulation where it is suspected the new treatment is most likely to work, and then may expand inclusion criteria if there is early evidence of a treatment benefit. The second type starts by enrolling from the overall population and then may selectively restrict enrollment if sufficient evidence accrues that the treatment is not benefiting a subpopulation. We construct two-stage designs of each type that guarantee strong control of the familywise Type I error rate, asymptotically. We then compare performance of the designs from each type under different scenarios; the scenarios mimic key features of a completed non-inferiority trial of HIV treatments. Performance criteria include power, sample size, Type I error, estimator bias, and confidence inteval coverage probability


    We consider the problem of designing a randomized trial for comparing two treatments versus a common control in two disjoint subpopulations. The subpopulations could be defined in terms of a biomarker or disease severity measured at baseline. The goal is to determine which treatments benefit which subpopulations. We develop a new class of adaptive enrichment designs tailored to solving this problem. Adaptive enrichment designs involve a preplanned rule for modifying enrollment based on accruing data in an ongoing trial. The proposed designs have preplanned rules for stopping accrual of treatment by subpopulation combinations, either for efficacy or futility. The motivation for this adaptive feature is that interim data may indicate that a subpopulation, such as those with lower disease severity at baseline, is unlikely to benefit from a particular treatment while uncertainty remains for the other treatment and/or subpopulation. We optimize these adaptive designs to have the minimum expected sample size under power and Type I error constraints. We compare the performance of the optimized adaptive design versus an optimized non-adaptive (single stage) design. Our approach is demonstrated in simulation studies that mimic features of a completed trial of a medical device for treating heart failure. The optimized adaptive design has 25% smaller expected sample size compared to the optimized non-adaptive design; however, the cost is that the optimized adaptive design has 8% greater maximum sample size. Open-source software that implements the trial design optimization is provided, allowing users to investigate the tradeoffs in using the proposed adaptive versus standard designs