169 research outputs found
Competing risks regression for clustered survival data via the marginal additive subdistribution hazards model
A population-averaged additive subdistribution hazards model is proposed to
assess the marginal effects of covariates on the cumulative incidence function
and to analyze correlated failure time data subject to competing risks. This
approach extends the population-averaged additive hazards model by
accommodating potentially dependent censoring due to competing events other
than the event of interest. Assuming an independent working correlation
structure, an estimating equations approach is outlined to estimate the
regression coefficients and a new sandwich variance estimator is proposed. The
proposed sandwich variance estimator accounts for both the correlations between
failure times and between the censoring times, and is robust to
misspecification of the unknown dependency structure within each cluster. We
further develop goodness-of-fit tests to assess the adequacy of the additive
structure of the subdistribution hazards for the overall model and each
covariate. Simulation studies are conducted to investigate the performance of
the proposed methods in finite samples. We illustrate our methods using data
from the STrategies to Reduce Injuries and Develop confidence in Elders
(STRIDE) trial
Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity
Cluster randomized trials (CRTs) are studies where treatment is randomized at
the cluster level but outcomes are typically collected at the individual level.
When CRTs are employed in pragmatic settings, baseline population
characteristics may moderate treatment effects, leading to what is known as
heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE
analyses in CRTs can enable an understanding of how interventions may impact
subpopulation outcomes. While closed-form sample size formulas have recently
been proposed, assuming known intracluster correlation coefficients (ICCs) for
both the covariate and outcome, guidance on optimal cluster randomized designs
to ensure maximum power with pre-specified HTE analyses has not yet been
developed. We derive new design formulas to determine the cluster size and
number of clusters to achieve the locally optimal design (LOD) that minimizes
variance for estimating the HTE parameter given a budget constraint. Given the
LODs are based on covariate and outcome-ICC values that are usually unknown, we
further develop the maximin design for assessing HTE, identifying the
combination of design resources that maximize the relative efficiency of the
HTE analysis in the worst case scenario. In addition, given the analysis of the
average treatment effect is often of primary interest, we also establish
optimal designs to accommodate multiple objectives by combining considerations
for studying both the average and heterogeneous treatment effects. We
illustrate our methods using the context of the Kerala Diabetes Prevention
Program CRT, and provide an R Shiny app to facilitate calculation of optimal
designs under a wide range of design parameters.Comment: 25 pages, 6 figures, 5 tables, 3 appendices; clarified phrasing,
typos correcte
preference: An R Package for Two-Stage Clinical Trial Design Accounting for Patient Preference
The consideration of a patient's treatment preference may be essential in determining how a patient will respond to a particular treatment. While traditional clinical trials are unable to capture these effects, the two-stage randomized preference design provides an important tool for researchers seeking to understand the role of patient preferences. In addition to the treatment effect, these designs seek to estimate the role of preferences through testing of selection and preference effects. The R package preference facilitates the use of two-stage clinical trials by providing the necessary tools to design and analyze these studies. To aid in the design, functions are provided to estimate the required sample size and to estimate the study power when a sample size is fixed. In addition, analysis functions are provided to determine the significance of each effect using either raw data or summary statistics. The package is able to incorporate either an unstratified or stratified preference design. The functionality of the package is demonstrated using data from a study evaluating two management methods in women found to have an atypical Pap smear
Group sequential two-stage preference designs
The two-stage preference design (TSPD) enables the inference for treatment
efficacy while allowing for incorporation of patient preference to treatment.
It can provide unbiased estimates for selection and preference effects, where a
selection effect occurs when patients who prefer one treatment respond
differently than those who prefer another, and a preference effect is the
difference in response caused by an interaction between the patient's
preference and the actual treatment they receive. One potential barrier to
adopting TSPD in practice, however, is the relatively large sample size
required to estimate selection and preference effects with sufficient power. To
address this concern, we propose a group sequential two-stage preference design
(GS-TSPD), which combines TSPD with sequential monitoring for early stopping.
In the GS-TSPD, pre-planned sequential monitoring allows investigators to
conduct repeated hypothesis tests on accumulated data prior to full enrollment
to assess study eligibility for early trial termination without inflating type
I error rates. Thus, the procedure allows investigators to terminate the study
when there is sufficient evidence of treatment, selection, or preference
effects during an interim analysis, thereby reducing the design resource in
expectation. To formalize such a procedure, we verify the independent
increments assumption for testing the selection and preference effects and
apply group sequential stopping boundaries from the approximate sequential
density functions. Simulations are then conducted to investigate the operating
characteristics of our proposed GS-TSPD compared to the traditional TSPD. We
demonstrate the applicability of the design using a study of Hepatitis C
treatment modality.Comment: 27 pages, 7 tables, 5 figures, 4 appendices; under review at
Statistics in Medicin
The use of multiple imputation in molecular epidemiologic studies assessing interaction effects
Background: In molecular epidemiologic studies biospecimen data are collected on only a proportion of subjects eligible for study. This leads to a missing data problem. Missing data methods, however, are not typically incorporated into analyses. Instead, complete-case (CC) analyses are performed, which result in biased and inefficient estimates.
Methods: Through simulations, we characterized the bias that results from CC methods when interaction effects are estimated, as this is a major aim of many molecular epidemiologic studies. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist.
Results: CC analyses were shown to result in considerable bias while MI reduced bias and increased efficiency over CC methods under specific conditions. It improved estimation even with minimal auxiliary information, except when extreme values of the covariate were more likely to be missing. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach.
Conclusions: Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Under NMAR we recommend MI as a tool to improve performance over CC when strong auxiliary data are available. MI, with the missing data mechanism specified, is another alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI’s ability to account for the uncertainty of these assumptions
The handling of missing data in molecular epidemiologic studies
Background: Molecular epidemiologic studies face a missing data problem as biospecimen data are often collected on only a proportion of subjects eligible for study.
Methods: We investigated all molecular epidemiologic studies published in CEBP in 2009 to characterize the prevalence of missing data and to elucidate how the issue was addressed. We considered multiple imputation (MI), a missing data technique that is readily available and easy to implement, as a possible solution.
Results: While the majority of studies had missing data, only 16% compared subjects with and without missing data. Furthermore, 95% of the studies with missing data performed a complete-case (CC) analysis, a method known to yield biased and inefficient estimates.
Conclusions: Missing data methods are not customarily being incorporated into the analyses of molecular epidemiologic studies. Barriers may include a lack of awareness that missing data exists, particularly when availability of data is part of the inclusion criteria; the need for specialized software; and a perception that the CC approach is the gold standard. Standard MI is a reasonable solution that is valid when the data are missing at random (MAR). If the data are not missing at random (NMAR) we recommend MI over CC when strong auxiliary data are available. MI, with the missing data mechanism specified, is another alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI’s ability to account for the uncertainty of these assumptions.
Impact: Missing data methods are underutilized, which can deleteriously affect the interpretation of results
Bayesian pathway analysis over brain network mediators for survival data
Technological advancements in noninvasive imaging facilitate the construction
of whole brain interconnected networks, known as brain connectivity. Existing
approaches to analyze brain connectivity frequently disaggregate the entire
network into a vector of unique edges or summary measures, leading to a
substantial loss of information. Motivated by the need to explore the effect
mechanism among genetic exposure, brain connectivity and time to disease onset,
we propose an integrative Bayesian framework to model the effect pathway
between each of these components while quantifying the mediating role of brain
networks. To accommodate the biological architectures of brain connectivity
constructed along white matter fiber tracts, we develop a structural modeling
framework that includes a symmetric matrix-variate accelerated failure time
model and a symmetric matrix response regression to characterize the effect
paths. We further impose within-graph sparsity and between-graph shrinkage to
identify informative network configurations and eliminate the interference of
noisy components. Extensive simulations confirm the superiority of our method
compared with existing alternatives. By applying the proposed method to the
landmark Alzheimer's Disease Neuroimaging Initiative study, we obtain
neurobiologically plausible insights that may inform future intervention
strategies
The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects
Abstract Background In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates. Methods Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist. Results CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach. Conclusions Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions
Sample size estimation in educational intervention trials with subgroup heterogeneity in only one arm
We present closed form sample size and power formulas motivated by the study of a psycho-social intervention in which the experimental group has the intervention delivered in teaching subgroups while the control group receives usual care. This situation is different from the usual clustered randomized trial since subgroup heterogeneity only exists in one arm. We take this modification into consideration and present formulas for the situation in which we compare a continuous outcome at both a single point in time and longitudinally over time. In addition, we present the optimal combination of parameters such as the number of subgroups and number of time points for minimizing sample size and maximizing power subject to constraints such as the maximum number of measurements that can be taken (i.e. a proxy for cost)
The Association Between Self-Reported Major Life Events and the Presence of Uterine Fibroids
Uterine fibroids are the most common benign tumors in reproductive age women. Factors associated with condition such as psychosocial stress are still being elucidated. This paper explores the association between major life events (MLE) stress and fibroids
- …