5,028 research outputs found

    An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling

    Full text link
    We present an applied study in cancer genomics for integrating data and inferences from laboratory experiments on cancer cell lines with observational data obtained from human breast cancer studies. The biological focus is on improving understanding of transcriptional responses of tumors to changes in the pH level of the cellular microenvironment. The statistical focus is on connecting experimentally defined biomarkers of such responses to clinical outcome in observational studies of breast cancer patients. Our analysis exemplifies a general strategy for accomplishing this kind of integration across contexts. The statistical methodologies employed here draw heavily on Bayesian sparse factor models for identifying, modularizing and correlating with clinical outcome these signatures of aggregate changes in gene expression. By projecting patterns of biological response linked to specific experimental interventions into observational studies where such responses may be evidenced via variation in gene expression across samples, we are able to define biomarkers of clinically relevant physiological states and outcomes that are rooted in the biology of the original experiment. Through this approach we identify microenvironment-related prognostic factors capable of predicting long term survival in two independent breast cancer datasets. These results suggest possible directions for future laboratory studies, as well as indicate the potential for therapeutic advances though targeted disruption of specific pathway components.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS261 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Interpretable prognostic modeling of endometrial cancer

    Get PDF
    Endometrial carcinoma (EC) is one of the most common gynecological cancers in the world. In this work we apply Cox proportional hazards (CPH) and optimal survival tree (OST) algorithms to the retrospective prognostic modeling of disease-specific survival in 842 EC patients. We demonstrate that linear CPH models are preferred for the EC risk assessment based on clinical features alone, while interpretable, non-linear OST models are favored when patient profiles can be supplemented with additional biomarker data. We show how visually interpretable tree models can help generate and explore novel research hypotheses by studying the OST decision path structure, in which L1 cell adhesion molecule expression and estrogen receptor status are correctly indicated as important risk factors in the p53 abnormal EC subgroup. To aid further clinical adoption of advanced machine learning techniques, we stress the importance of quantifying model discrimination and calibration performance in the development of explainable clinical prediction models.Peer reviewe

    Clinical prediction modelling in oral health: A review of study quality and empirical examples of model development

    Get PDF
    Background Substantial efforts have been made to improve the reproducibility and reliability of scientific findings in health research. These efforts include the development of guidelines for the design, conduct and reporting of preclinical studies (ARRIVE), clinical trials (ROBINS-I, CONSORT), observational studies (STROBE), and systematic reviews and meta-analyses (PRISMA). In recent years, the use of prediction modelling has increased in the health sciences. Clinical prediction models use information at the individual patient level to estimate the probability of a health outcome(s). Such models offer the potential to assist in clinical decision-making and to improve medical care. Guidelines such as PROBAST (Prediction model Risk Of Bias Assessment Tool) have been recently published to further inform the conduct of prediction modelling studies. Related guidelines for the reporting of these studies, such as TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) instrument, have also been developed. Since the early 2000s, oral health prediction models have been used to predict the risk of various types of oral conditions, including dental caries, periodontal diseases and oral cancers. However, there is a lack of information on the methodological quality and reporting transparency of the published oral health prediction modelling studies. As a consequence, and due to the unknown quality and reliability of these studies, it remains unclear to what extent it is possible to generalise their findings and to replicate their derived models. Moreover, there remains a need to demonstrate the conduct of prediction modelling studies in oral health field following the contemporary guidelines. This doctoral project addresses these issues using two systematic reviews and two empirical analyses. This thesis is the first comprehensive and systematic project reviewing the study quality and demonstrating the use of registry data and longitudinal cohorts to develop clinical prediction models in oral health. Aims • To identify and examine the quality of existing prediction modelling studies in the major fields of oral health.• To demonstrate the conduct and reporting of a prediction modelling study following current guidelines, incorporating machine learning algorithms and accounting for multiple sources of biases. Methods As one of the most prevalent oral conditions, chronic periodontitis was chosen as the exemplar pathology for the first part of this thesis. A systematic review was conducted to investigate the existing prediction models for the incidence and progression of this condition. Based upon this initial overview, a more comprehensive critical review was conducted to assess the methodological quality and completeness of reporting for prediction modelling studies in the field of oral health. The risk of bias in the existing literature was assessed using the PROBAST criteria, and the quality of study reporting was measured in accordance with the TRIPOD guidelines. Following these two reviews, this research project demonstrated the conduct and reporting of a clinical prediction modelling study using two empirical examples. Two types of analyses that are commonly used for two different types of outcome data were adopted: survival analysis for censored outcomes and logistic regression analysis for binary outcomes. Models were developed to 1) predict the three- and five-year disease-specific survival of patients with oral and pharyngeal cancers, based on 21,154 cases collected by a large cancer registry program in the US, the Surveillance, Epidemiology and End Results (SEER) program, and 2) to predict the occurrence of acute and persistent pain following root canal treatment, based on the electronic dental records of 708 adult patients collected by the National Practice-Based Research Network. In these two case studies, all prediction models were developed in five steps: (i) framing the research question; (ii) data acquisition and pre-processing; (iii) model generation; (iv) model validation and performance evaluation; and (v) model presentation and reporting. In accordance with the PROBAST recommendations, the risk of bias during the modelling process was reduced in the following aspects: • In the first case study, three types of biases were taken into account: (i) bias due to missing data was reduced by adopting compatible methods to conduct imputation; (ii) bias due to unmeasured predictors was tested by sensitivity analysis; and (iii) bias due to the initial choice of modelling approach was addressed by comparing tree-based machine learning algorithms (survival tree, random survival forest and conditional inference forest) with the traditional statistical model (Cox regression). • In the second case study, the following strategies were employed: (i) missing data were addressed by multiple imputation with missing indicator methods; (ii) a multilevel logistic regression approach was adopted for model development in order to fit Table of Contents xi the hierarchical structure of the data; (iii) model complexity was reduced using the Least Absolute Shrinkage and Selection Operator (LASSO) for predictor selection; and (iv) the models’ predictive performance was evaluated comprehensively by using the Area Under the Precision Recall Curve (AUPRC) in addition to the Area Under the Receiver Operating Characteristic curve (AUROC); (v) finally, and most importantly, given the existing criticism in the research community concerning the gender-based and racial bias in risk prediction models, we compared the models’ predictive performance built with different sets of predictors (including a clinical set, a sociodemographic set and a combination of both, the ‘general’ set). Results The first and second review studies indicated that, in the field of oral health, the popularity of multivariable prediction models has increased in recent years. Bias and variance are two components of the uncertainty (e.g., the mean squared error) in model estimation. However, the majority of the existing studies did not account for various sources of bias, such as measurement error and inappropriate handling of missing data. Moreover, non-transparent reporting and lack of reproducibility of the models were also identified in the existing oral health prediction modelling studies. These findings provided motivation to conduct two case studies aimed at demonstrating adherence to the contemporary guidelines and to best practice. In the third study, comparable predictive capabilities between Cox regression and the non-parametric tree-based machine learning algorithms were observed for predicting the survival of patients with oral and pharyngeal cancers. For example, the C-index for a Cox model and a random survival forest in predicting three-year survival were 0.82 and 0.84, respectively. A novelty of this study was the development of an online calculator designed to provide an open and transparent estimation of patients’ survival probability for up to five years after diagnosis. This calculator has clinical translational potential and could aid in patient stratification and treatment planning, at least in the context of ongoing research. In addition, the transparent reporting of this study was achieved by following the TRIPOD checklist and sharing all data and codes. In the fourth study, LASSO regression suggested that pre-treatment clinical factors were important in the development of one-week and six-month postoperative pain following root canal treatment. Among all the developed multilevel logistic models, models with a clinical set of predictors yielded similar predictive performance to models with a general set of predictors, while the models with sociodemographic predictors showed the weakest predictive ability. For example, for predicting one-week postoperative pain, the AUROC for models with clinical, sociodemographic and general predictors were 0.82, 0.68 and 0,84, respectively, and the AUPRC were 0.66, 0.40 and 0.72, respectively. Conclusion The significance of this research project is twofold. First, prediction models have been developed for potential clinical use in the context of various oral conditions. Second, this research represents the first attempt to standardise the conduct of this type of studies in oral health research. This thesis presents three conclusions: 1) Adherence to contemporary best practice guidelines such as PROBAST and TRIPOD is limited in the field of oral health research. In response, this PhD project disseminates these guidelines and leverages their advantages to develop effective prediction models for use in dentistry and oral health. 2) Use of appropriate procedures, accounting for and adapting to multiple sources of bias in model development, produces predictive tools of increased reliability and accuracy that hold the potential to be implemented in clinical practice. Therefore, for future prediction modelling research, it is important that data analysts work towards eliminating bias, regardless of the areas in which the models are employed. 3) Machine learning algorithms provide alternatives to traditional statistical models for clinical prediction purposes. Additionally, in the presence of clinical factors, sociodemographic characteristics contribute less to the improvement of models’ predictive performance or to providing cogent explanations of the variance in the models, regardless of the modelling approach. Therefore, it is timely to reconsider the use of sociodemographic characteristics in clinical prediction modelling research. It is suggested that this is a proportionate and evidence based strategy aimed at reducing biases in healthcare risk prediction that may be derived from gender and racial characteristics inherent in sociodemographic data sets.Thesis (Ph.D.) -- University of Adelaide, School of Public Health, 202

    Use of radiomic data to improve imputation of HPV (p16) status in oropharyngeal cancer

    Get PDF
    The incidence of oropharyngeal cancer has been steadily increasing during the past decades. This increase is linked with human papillomavirus, one of the most common sexually transmitted diseases in Canada and worldwide. Recent studies have shown the importance of using p16 testing to assess the HPV status of all oropharyngeal cancer patients on diagnostic. However, that practice was not common during early 2000, making historical data flawed. Many imputation models have been built to retroactively predict the HPV status of oropharyngeal cancer patients that were not tested. This models are based on clinical data, which is easy to store and analyze. However, recent advancements in the field of radiomics have enabled the use of CT scans obtained from patients to build models for cancer behavior. In this study, we take a novel approach to HPV status imputation by building machine learning models that utilize not only clinical data but also imaging features, aiming to show a significant improvement over classical models. The increase of performance between state of the art clinical models and our models will be assessed through the use of the RADCURE dataset from the Princess MargaretOutgoin
    • …
    corecore