7 research outputs found

    Scalable Semi-parametric Methods in Biostatistics

    Get PDF
    Individualized health, or precision medicine, is an emerging approach for disease prevention and treatment guided by the individual characteristics of the genome, medical imaging, family history, environment and lifestyle of each person. To achieve this goal, it requires efficient and scalable statistical technologies to decipher the connection between this information and the health outcomes. In this thesis, we present statistical methods in support of the goal of individualized health. In Part I, the primary goal is to provide flexible and efficient estimation to the latent etiology distribution given imperfect measurements. We parameterize the latent etiologic state as a multivariate binary variable, where each binary node represents the presence/absence of an etiologic agent. The multivariate binary measurements are assumed to be conditionally independent given the latent state. Their relation is parameterized by the true positive rates and false positive rates of the measurements. External information extracted from previous literature on the true positive rates are summarized by Beta prior distributions and used to improve the model identifiability. Experts' knowledge on the competition mechanism among etiologic agents is translated into a sparse correlation structure of the latent state. A scalable Markov Chain Monte Carlo algorithm is proposed for approximating the exact posterior distribution. Also, a variational Bayesian algorithm is developed for fast and even more scalable estimation in case of large-scale problems. We demonstrate the model using the data from the motivating Pneumonia Etiology Research for Child Health (PERCH) study, which aims to provide a comprehensive estimation of the etiology distribution of childhood pneumonia in developing countries. In Part II, the key objective is to improve the efficiency of survival regression estimators by incorporating external information on the population level survival rates. The accelerated failure time (AFT) model and the Cox proportional hazards model are considered. For each model, the first estimating equation is created based on the benchmark semi-parametric estimator (partial-likelihood estimator for Cox and log-rank estimator for AFT), then additional estimating equations are formed based on the auxiliary survival information. The estimating equations are transformed by applying functional delta method to a set of over-identifying moment conditions. Finally, the parameter estimation and model diagnostics are carried out following the standard generalized method of moments (GMM) framework. We show that the new GMM-based estimators are asymptotically and empirically more efficient than the benchmark estimators. These new estimators are applied to a recent retrospective study on the prognosis of pancreatic cancer

    Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data

    Get PDF
    Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe

    Cenozoic Depositional Evolution and Stratal Patterns in the Western Pearl River Mouth Basin, South China Sea: Implications for Hydrocarbon Exploration

    No full text
    Investigating the deposition evolution and stratal stacking patterns in continental rift basins is critical not only to better understand the mechanism of basin fills but also to reveal the enrichment regularity of hydrocarbon reservoirs. The Pearl River Mouth Basin (PRMB) is a petroliferous continental rift basin located in the northern continental shelf of the South China Sea. In this study, the depositional evolution process and stacking pattern of the Zhu III Depression, western PRMB were studied through the integration of 3D seismic data, core data, and well logs. Five types of depositional systems formed from the Eocene to the Miocene, including the fan delta, meandering river delta, tidal flat, lacustrine system, and neritic shelf system. The representative depositional systems changed from the proximal fan delta and lacustrine system in the Eocene–early Oligocene, to the tidal flat and fan delta in the late Oligocene, and then the neritic shelf system in the Miocene. The statal stacking pattern varied in time and space with a total of six types of slope break belts developed. The diversity of sequence architecture results from the comprehensive effect of tectonic activities, sediment supply, sea/lake level changes, and geomorphic conditions. In addition, our results suggest that the types of traps are closely associated with stratal stacking patterns. Structural traps were developed in the regions of tectonic slope breaks, whereas lithological traps occurred within sedimentary slope breaks. This study highlights the diversity and complexity of sequence architecture in the continental rift basin, and the proposed hydrocarbon distribution patterns are applicable to reservoir prediction in the PRMB and the other continental rift basins

    Predicting survival time for metastatic castration resistant prostate cancer: An iterative imputation approach [version 1; referees: 2 approved, 1 approved with reservations]

    No full text
    In this paper, we present our winning method for survival time prediction in the 2015 Prostate Cancer DREAM Challenge, a recent crowdsourced competition focused on risk and survival time predictions for patients with metastatic castration-resistant prostate cancer (mCRPC). We are interested in using a patient's covariates to predict his or her time until death after initiating standard therapy. We propose an iterative algorithm to multiply impute right-censored survival times and use ensemble learning methods to characterize the dependence of these imputed survival times on possibly many covariates. We show that by iterating over imputation and ensemble learning steps, we guide imputation with patient covariates and, subsequently, optimize the accuracy of survival time prediction. This method is generally applicable to time-to-event prediction problems in the presence of right-censoring. We demonstrate the proposed method's performance with training and validation results from the DREAM Challenge and compare its accuracy with existing methods
    corecore