STATISTICAL METHODS FOR DATA FROM CASE-COHORT STUDIES

Abstract

In epidemiological studies and disease prevention trials, interest often lies in the relationship between certain disease endpoint and some exposure of interest. When the event is rare and/or some of the covariate information are quite expensive to collect for the entire cohort, case-cohort designs are widely used to reduce the financial cost of the study while achieving the same study goals. The case-cohort sampling scheme entails the random sampling of individuals, called the sub-cohort, along with all the cases. In the situation when the event rate is not low but resources are limited, the generalized case-cohort design is more appropriate, where only a fraction of cases are sampled along with the sub-cohort. In this dissertation, we consider two aspects of case-cohort studies. One is for statistical methods for the analysis of recurrent events and the other concerns power/sample size calculation for interaction test. Many methods for the analysis of data from case-cohort studies have been proposed in the literature. However, most of these methods are for either a single event or multitude of events of different types on the same subject. There has not been much work on the recurrent events data under case-cohort sampling scheme. Valid statistical methods that take into account the correlation between the events from the same individual needs to be developed. In this dissertation, the first two topics are related to recurrent events. We consider modeling the recurrent events using the rate model under the original and generalized case-cohort designs. The first topic considers the multiplicative rates model and the second topic considers additive rates models. For both types of the rate models, we propose weighted estimating equation approach for the parameter estimates for both sampling designs. We showed that the proposed estimators are consistent and asymptotically normally distributed. We conducted simulation studies to examine the performance of our proposed estimators in finite samples and they performed well. For the multiplicative rates model, we illustrated the proposed method to assess the relationship between prior measles infection and acute lower-respiratory-infections (ALRI) in a double-blinded randomized clinical trial, conducted in Brazil. We illustrated our proposed method for additive rates model to study the effect of FEV1 on the recurrence of pulmonary exacerbation in patients with cystic fibrosis. In the third topic, we address another aspect of the case-cohort design. All the previous work in the literature concern sample size and power calculation in case-cohort data for a dichotomized main effect. However, in certain situations, one might be interested in the association of a covariate and time to event response in different biomarker groups, which may be expensive to measure. We extend the existing idea for the single binary main effect to the interaction between the variable and the dichotomized biomarker in the presence of a rare event. We propose different power formulas based on the simplification of a generalized log-rank test for the case-cohort design. A cost efficiency formula comparing the case-cohort design to a simple random sample is derived. We examine the performance of the bounds based on the same test. Simulation studies are conducted to illustrate the efficiency for the case-cohort design. We illustrate the use of the formula based on information from the pooled databases of Lung Adjuvant Cisplatin Evaluation (LACE) and Cancer and Leukemia Group B (CALGB) 9633.Doctor of Philosoph

    Similar works