7 research outputs found

    Quantile regression modeling of latent trajectory features with longitudinal data

    No full text
    Quantile regression has demonstrated promising utility in longitudinal data analysis. Existing work is primarily focused on modeling cross-sectional outcomes, while outcome trajectories often carry more substantive information in practice. In this work, we develop a trajectory quantile regression framework that is designed to robustly and flexibly investigate how latent individual trajectory features are related to observed subject characteristics. The proposed models are built under multilevel modeling with usual parametric assumptions lifted or relaxed. We derive our estimation procedure by novelly transforming the problem at hand to quantile regression with perturbed responses and adapting the bias correction technique for handling covariate measurement errors. We establish desirable asymptotic properties of the proposed estimator, including uniform consistency and weak convergence. Extensive simulation studies confirm the validity of the proposed method as well as its robustness. An application to the DURABLE trial uncovers sensible scientific findings and illustrates the practical value of our proposals.</p

    Controlling Cumulative Adverse Risk in Learning Optimal Dynamic Treatment Regimens

    No full text
    Dynamic treatment regimen (DTR) is one of the most important tools to tailor treatment in personalized medicine. For many diseases such as cancer and type 2 diabetes mellitus (T2D), more aggressive treatments can lead to a higher efficacy but may also increase risk. However, few methods for estimating DTRs can take into account both cumulative benefit and risk. In this work, we propose a general statistical learning framework to learn optimal DTRs that maximize the reward outcome while controlling the cumulative adverse risk to be below a pre-specified threshold. We convert this constrained optimization problem into an unconstrained optimization using a Lagrange function. We then solve the latter using either backward learning algorithms or simultaneously over all stages based on constructing a novel multistage ramp loss. Theoretically, we establish Fisher consistency of the proposed method and further obtain non-asymptotic convergence rates for both reward and risk outcomes under the estimated DTRs. The finite sample performance of the proposed method is demonstrated via simulation studies and through an application to a two-stage clinical trial for T2D patients.</p

    An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests

    No full text
    <p>With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (<i><a href="https://github.com/kdoub5ha/ITR.Forest" target="_blank">https://github.com/kdoub5ha/ITR.Forest</a></i>). Supplementary materials for this article are available online.</p

    A General Framework for Treatment Effect Estimators Considering Patient Adherence

    No full text
    Randomized controlled trials remain a gold standard in evaluating the efficacy and safety of a new treatment. Ideally, patients adhere to their treatments for the duration of the study, and the resulting data can be analyzed unambiguously for efficacy and safety outcomes. However, some patients may discontinue the study treatment due to intercurrent events, which leaves missing observations or observations that do not reflect the randomly assigned treatment. Frequently, an intent-to-treat analysis (or a modification thereof) is done to estimate the treatment effect for all randomized patients regardless of the occurrence of intercurrent events. Alternatively, clinicians may be more interested in understanding the efficacy and safety for those who can adhere to the study treatment. The naive per-protocol analysis may provide a biased estimate for the treatment difference because the observed adherence populations may not be comparable between two treatments. In this article, we propose two methods for estimation of the treatment difference for those who can adhere to one or both treatments based on the counterfactual framework. Theoretical derivations and a simulation study show the proposed methods provide consistent estimators for the treatment difference for the adherent population of interest. A real data example comparing two basal insulins for patients with type-1 diabetes is provided using the proposed methods. Supplementary materials for this article are available online.</p

    Robust Alternatives to ANCOVA for Estimating the Treatment Effect via a Randomized Comparative Study

    No full text
    In comparing two treatments via a randomized clinical trial, the analysis of covariance (ANCOVA) technique is often utilized to estimate an overall treatment effect. The ANCOVA is generally perceived as a more efficient procedure than its simple two sample estimation counterpart. Unfortunately, when the ANCOVA model is nonlinear, the resulting estimator is generally not consistent. Recently, various nonparametric alternatives to the ANCOVA, such as the augmentation methods, have been proposed to estimate the treatment effect by adjusting the covariates. However, the properties of these alternatives have not been studied in the presence of treatment allocation imbalance. In this article, we take a different approach to explore how to improve the precision of the naive two-sample estimate even when the observed distributions of baseline covariates between two groups are dissimilar. Specifically, we derive a bias-adjusted estimation procedure constructed from a conditional inference principle via relevant ancillary statistics from the observed covariates. This estimator is shown to be asymptotically equivalent to an augmentation estimator under the unconditional setting. We utilize the data from a clinical trial for evaluating a combination treatment of cardiovascular diseases to illustrate our findings.</p

    Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs

    No full text
    Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, it could be very expensive to compute the Shapley value. Specifically, in a d-player coalition game, calculating a Shapley value requires the evaluation of d! or 2d marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when d is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network.</p

    scRAA: the development of a robust and automatic annotation procedure for single-cell RNA sequencing data

    No full text
    A critical task in single-cell RNA sequencing (scRNA-Seq) data analysis is to identify cell types from heterogeneous tissues. While the majority of classification methods demonstrated high performance in scRNA-Seq annotation problems, a robust and accurate solution is desired to generate reliable outcomes for downstream analyses, for instance, marker genes identification, differentially expressed genes, and pathway analysis. It is hard to establish a universally good metric. Thus, a universally good classification method for all kinds of scenarios does not exist. In addition, reference and query data in cell classification are usually from different experimental batches, and failure to consider batch effects may result in misleading conclusions. To overcome this bottleneck, we propose a robust ensemble approach to classify cells and utilize a batch correction method between reference and query data. We simulated four scenarios that comprise simple to complex batch effect and account for varying cell-type proportions. We further tested our approach on both lung and pancreas data. We found improved prediction accuracy and robust performance across simulation scenarios and real data. The incorporation of batch effect correction between reference and query, and the ensemble approach improve cell-type prediction accuracy while maintaining robustness. We demonstrated these through simulated and real scRNA-Seq data.</p
    corecore