169 research outputs found

    Competing risks regression for clustered survival data via the marginal additive subdistribution hazards model

    Full text link
    A population-averaged additive subdistribution hazards model is proposed to assess the marginal effects of covariates on the cumulative incidence function and to analyze correlated failure time data subject to competing risks. This approach extends the population-averaged additive hazards model by accommodating potentially dependent censoring due to competing events other than the event of interest. Assuming an independent working correlation structure, an estimating equations approach is outlined to estimate the regression coefficients and a new sandwich variance estimator is proposed. The proposed sandwich variance estimator accounts for both the correlations between failure times and between the censoring times, and is robust to misspecification of the unknown dependency structure within each cluster. We further develop goodness-of-fit tests to assess the adequacy of the additive structure of the subdistribution hazards for the overall model and each covariate. Simulation studies are conducted to investigate the performance of the proposed methods in finite samples. We illustrate our methods using data from the STrategies to Reduce Injuries and Develop confidence in Elders (STRIDE) trial

    Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity

    Full text link
    Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster level but outcomes are typically collected at the individual level. When CRTs are employed in pragmatic settings, baseline population characteristics may moderate treatment effects, leading to what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs can enable an understanding of how interventions may impact subpopulation outcomes. While closed-form sample size formulas have recently been proposed, assuming known intracluster correlation coefficients (ICCs) for both the covariate and outcome, guidance on optimal cluster randomized designs to ensure maximum power with pre-specified HTE analyses has not yet been developed. We derive new design formulas to determine the cluster size and number of clusters to achieve the locally optimal design (LOD) that minimizes variance for estimating the HTE parameter given a budget constraint. Given the LODs are based on covariate and outcome-ICC values that are usually unknown, we further develop the maximin design for assessing HTE, identifying the combination of design resources that maximize the relative efficiency of the HTE analysis in the worst case scenario. In addition, given the analysis of the average treatment effect is often of primary interest, we also establish optimal designs to accommodate multiple objectives by combining considerations for studying both the average and heterogeneous treatment effects. We illustrate our methods using the context of the Kerala Diabetes Prevention Program CRT, and provide an R Shiny app to facilitate calculation of optimal designs under a wide range of design parameters.Comment: 25 pages, 6 figures, 5 tables, 3 appendices; clarified phrasing, typos correcte

    preference: An R Package for Two-Stage Clinical Trial Design Accounting for Patient Preference

    Get PDF
    The consideration of a patient's treatment preference may be essential in determining how a patient will respond to a particular treatment. While traditional clinical trials are unable to capture these effects, the two-stage randomized preference design provides an important tool for researchers seeking to understand the role of patient preferences. In addition to the treatment effect, these designs seek to estimate the role of preferences through testing of selection and preference effects. The R package preference facilitates the use of two-stage clinical trials by providing the necessary tools to design and analyze these studies. To aid in the design, functions are provided to estimate the required sample size and to estimate the study power when a sample size is fixed. In addition, analysis functions are provided to determine the significance of each effect using either raw data or summary statistics. The package is able to incorporate either an unstratified or stratified preference design. The functionality of the package is demonstrated using data from a study evaluating two management methods in women found to have an atypical Pap smear

    Group sequential two-stage preference designs

    Full text link
    The two-stage preference design (TSPD) enables the inference for treatment efficacy while allowing for incorporation of patient preference to treatment. It can provide unbiased estimates for selection and preference effects, where a selection effect occurs when patients who prefer one treatment respond differently than those who prefer another, and a preference effect is the difference in response caused by an interaction between the patient's preference and the actual treatment they receive. One potential barrier to adopting TSPD in practice, however, is the relatively large sample size required to estimate selection and preference effects with sufficient power. To address this concern, we propose a group sequential two-stage preference design (GS-TSPD), which combines TSPD with sequential monitoring for early stopping. In the GS-TSPD, pre-planned sequential monitoring allows investigators to conduct repeated hypothesis tests on accumulated data prior to full enrollment to assess study eligibility for early trial termination without inflating type I error rates. Thus, the procedure allows investigators to terminate the study when there is sufficient evidence of treatment, selection, or preference effects during an interim analysis, thereby reducing the design resource in expectation. To formalize such a procedure, we verify the independent increments assumption for testing the selection and preference effects and apply group sequential stopping boundaries from the approximate sequential density functions. Simulations are then conducted to investigate the operating characteristics of our proposed GS-TSPD compared to the traditional TSPD. We demonstrate the applicability of the design using a study of Hepatitis C treatment modality.Comment: 27 pages, 7 tables, 5 figures, 4 appendices; under review at Statistics in Medicin

    The use of multiple imputation in molecular epidemiologic studies assessing interaction effects

    Get PDF
    Background: In molecular epidemiologic studies biospecimen data are collected on only a proportion of subjects eligible for study. This leads to a missing data problem. Missing data methods, however, are not typically incorporated into analyses. Instead, complete-case (CC) analyses are performed, which result in biased and inefficient estimates. Methods: Through simulations, we characterized the bias that results from CC methods when interaction effects are estimated, as this is a major aim of many molecular epidemiologic studies. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist. Results: CC analyses were shown to result in considerable bias while MI reduced bias and increased efficiency over CC methods under specific conditions. It improved estimation even with minimal auxiliary information, except when extreme values of the covariate were more likely to be missing. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach. Conclusions: Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Under NMAR we recommend MI as a tool to improve performance over CC when strong auxiliary data are available. MI, with the missing data mechanism specified, is another alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI’s ability to account for the uncertainty of these assumptions

    The handling of missing data in molecular epidemiologic studies

    Get PDF
    Background: Molecular epidemiologic studies face a missing data problem as biospecimen data are often collected on only a proportion of subjects eligible for study. Methods: We investigated all molecular epidemiologic studies published in CEBP in 2009 to characterize the prevalence of missing data and to elucidate how the issue was addressed. We considered multiple imputation (MI), a missing data technique that is readily available and easy to implement, as a possible solution. Results: While the majority of studies had missing data, only 16% compared subjects with and without missing data. Furthermore, 95% of the studies with missing data performed a complete-case (CC) analysis, a method known to yield biased and inefficient estimates. Conclusions: Missing data methods are not customarily being incorporated into the analyses of molecular epidemiologic studies. Barriers may include a lack of awareness that missing data exists, particularly when availability of data is part of the inclusion criteria; the need for specialized software; and a perception that the CC approach is the gold standard. Standard MI is a reasonable solution that is valid when the data are missing at random (MAR). If the data are not missing at random (NMAR) we recommend MI over CC when strong auxiliary data are available. MI, with the missing data mechanism specified, is another alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI’s ability to account for the uncertainty of these assumptions. Impact: Missing data methods are underutilized, which can deleteriously affect the interpretation of results

    Bayesian pathway analysis over brain network mediators for survival data

    Full text link
    Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among genetic exposure, brain connectivity and time to disease onset, we propose an integrative Bayesian framework to model the effect pathway between each of these components while quantifying the mediating role of brain networks. To accommodate the biological architectures of brain connectivity constructed along white matter fiber tracts, we develop a structural modeling framework that includes a symmetric matrix-variate accelerated failure time model and a symmetric matrix response regression to characterize the effect paths. We further impose within-graph sparsity and between-graph shrinkage to identify informative network configurations and eliminate the interference of noisy components. Extensive simulations confirm the superiority of our method compared with existing alternatives. By applying the proposed method to the landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain neurobiologically plausible insights that may inform future intervention strategies

    The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects

    Get PDF
    Abstract Background In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates. Methods Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist. Results CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach. Conclusions Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions

    The Association Between Self-Reported Major Life Events and the Presence of Uterine Fibroids

    Get PDF
    Uterine fibroids are the most common benign tumors in reproductive age women. Factors associated with condition such as psychosocial stress are still being elucidated. This paper explores the association between major life events (MLE) stress and fibroids

    Sample size estimation in educational intervention trials with subgroup heterogeneity in only one arm

    Get PDF
    We present closed form sample size and power formulas motivated by the study of a psycho-social intervention in which the experimental group has the intervention delivered in teaching subgroups while the control group receives usual care. This situation is different from the usual clustered randomized trial since subgroup heterogeneity only exists in one arm. We take this modification into consideration and present formulas for the situation in which we compare a continuous outcome at both a single point in time and longitudinally over time. In addition, we present the optimal combination of parameters such as the number of subgroups and number of time points for minimizing sample size and maximizing power subject to constraints such as the maximum number of measurements that can be taken (i.e. a proxy for cost)
    • …
    corecore