5,028 research outputs found
An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling
We present an applied study in cancer genomics for integrating data and
inferences from laboratory experiments on cancer cell lines with observational
data obtained from human breast cancer studies. The biological focus is on
improving understanding of transcriptional responses of tumors to changes in
the pH level of the cellular microenvironment. The statistical focus is on
connecting experimentally defined biomarkers of such responses to clinical
outcome in observational studies of breast cancer patients. Our analysis
exemplifies a general strategy for accomplishing this kind of integration
across contexts. The statistical methodologies employed here draw heavily on
Bayesian sparse factor models for identifying, modularizing and correlating
with clinical outcome these signatures of aggregate changes in gene expression.
By projecting patterns of biological response linked to specific experimental
interventions into observational studies where such responses may be evidenced
via variation in gene expression across samples, we are able to define
biomarkers of clinically relevant physiological states and outcomes that are
rooted in the biology of the original experiment. Through this approach we
identify microenvironment-related prognostic factors capable of predicting long
term survival in two independent breast cancer datasets. These results suggest
possible directions for future laboratory studies, as well as indicate the
potential for therapeutic advances though targeted disruption of specific
pathway components.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS261 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
Anatomic Fat Depots and Coronary Plaque Among Human Immunodeficiency Virus-Infected and Uninfected Men in the Multicenter AIDS Cohort Study.
Methods.  In a cross-sectional substudy of the Multicenter AIDS Cohort Study, noncontrast cardiac computed tomography (CT) scanning for coronary artery calcium (CAC) scoring was performed on all men, and, for men with normal renal function, coronary CT angiography (CTA) was performed. Associations between fat depots (visceral adipose tissue [VAT], abdominal subcutaneous adipose tissue [aSAT], and thigh subcutaneous adipose tissue [tSAT]) with coronary plaque presence and extent were assessed with logistic and linear regression adjusted for age, race, cardiovascular disease (CVD) risk factors, body mass index (BMI), and human immunodeficiency virus (HIV) parameters. Results.  Among HIV-infected men (n = 597) but not HIV-uninfected men (n = 343), having greater VAT was positively associated with noncalcified plaque presence (odds ratio [OR] = 1.04, P < .05), with a significant interaction (P < .05) by HIV serostatus. Human immunodeficiency virus-infected men had lower median aSAT and tSAT and greater median VAT among men with BMI <25 and 25-29.9 kg/m(2). Among HIV-infected men, VAT was positively associated with presence of coronary plaque on CTA after adjustment for CVD risk factors (OR = 1.04, P < .05), but not after additional adjustment for BMI. There was an inverse association between aSAT and extent of total plaque among HIV-infected men, but not among HIV-uninfected men. Lower tSAT was associated with greater CAC and total plaque score extent regardless of HIV serostatus. Conclusions.  The presence of greater amounts of VAT and lower SAT may contribute to increased risk for coronary artery disease among HIV-infected persons
Recommended from our members
Narrowed Gaps and Persistent Challenges: Examining Rural-Nonrural Disparities in Postsecondary Outcomes over Time
Empirical studies have concluded that rural students experience lower rates of college enrollment and degree completion compared to their nonrural peers, but this literature needs to be expanded and updated for a continually changing context. This article examines the rural-nonrural disparities in students’ postsecondary trajectories, influences, and outcomes. By comparing results to past research using similar national data and an identical design, we are able to examine change over time. Results show narrowed gaps from the 1990s into the 2000s, but with rural students still facing persistent challenges and experiencing lower average rates of college enrollment and degree completion
Interpretable prognostic modeling of endometrial cancer
Endometrial carcinoma (EC) is one of the most common gynecological cancers in the world. In this work we apply Cox proportional hazards (CPH) and optimal survival tree (OST) algorithms to the retrospective prognostic modeling of disease-specific survival in 842 EC patients. We demonstrate that linear CPH models are preferred for the EC risk assessment based on clinical features alone, while interpretable, non-linear OST models are favored when patient profiles can be supplemented with additional biomarker data. We show how visually interpretable tree models can help generate and explore novel research hypotheses by studying the OST decision path structure, in which L1 cell adhesion molecule expression and estrogen receptor status are correctly indicated as important risk factors in the p53 abnormal EC subgroup. To aid further clinical adoption of advanced machine learning techniques, we stress the importance of quantifying model discrimination and calibration performance in the development of explainable clinical prediction models.Peer reviewe
Clinical prediction modelling in oral health: A review of study quality and empirical examples of model development
Background Substantial efforts have been made to improve the reproducibility and reliability of scientific findings in health research. These efforts include the development of guidelines for the design, conduct and reporting of preclinical studies (ARRIVE), clinical trials (ROBINS-I, CONSORT), observational studies (STROBE), and systematic reviews and meta-analyses (PRISMA). In recent years, the use of prediction modelling has increased in the health sciences. Clinical prediction models use information at the individual patient level to estimate the probability of a health outcome(s). Such models offer the potential to assist in clinical decision-making and to improve medical care. Guidelines such as PROBAST (Prediction model Risk Of Bias Assessment Tool) have been recently published to further inform the conduct of prediction modelling studies. Related guidelines for the reporting of these studies, such as TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) instrument, have also been developed. Since the early 2000s, oral health prediction models have been used to predict the risk of various types of oral conditions, including dental caries, periodontal diseases and oral cancers. However, there is a lack of information on the methodological quality and reporting transparency of the published oral health prediction modelling studies. As a consequence, and due to the unknown quality and reliability of these studies, it remains unclear to what extent it is possible to generalise their findings and to replicate their derived models. Moreover, there remains a need to demonstrate the conduct of prediction modelling studies in oral health field following the contemporary guidelines. This doctoral project addresses these issues using two systematic reviews and two empirical analyses. This thesis is the first comprehensive and systematic project reviewing the study quality and demonstrating the use of registry data and longitudinal cohorts to develop clinical prediction models in oral health. Aims • To identify and examine the quality of existing prediction modelling studies in the major fields of oral health.• To demonstrate the conduct and reporting of a prediction modelling study following current guidelines, incorporating machine learning algorithms and accounting for multiple sources of biases. Methods As one of the most prevalent oral conditions, chronic periodontitis was chosen as the exemplar pathology for the first part of this thesis. A systematic review was conducted to investigate the existing prediction models for the incidence and progression of this condition. Based upon this initial overview, a more comprehensive critical review was conducted to assess the methodological quality and completeness of reporting for prediction modelling studies in the field of oral health. The risk of bias in the existing literature was assessed using the PROBAST criteria, and the quality of study reporting was measured in accordance with the TRIPOD guidelines. Following these two reviews, this research project demonstrated the conduct and reporting of a clinical prediction modelling study using two empirical examples. Two types of analyses that are commonly used for two different types of outcome data were adopted: survival analysis for censored outcomes and logistic regression analysis for binary outcomes. Models were developed to 1) predict the three- and five-year disease-specific survival of patients with oral and pharyngeal cancers, based on 21,154 cases collected by a large cancer registry program in the US, the Surveillance, Epidemiology and End Results (SEER) program, and 2) to predict the occurrence of acute and persistent pain following root canal treatment, based on the electronic dental records of 708 adult patients collected by the National Practice-Based Research Network. In these two case studies, all prediction models were developed in five steps: (i) framing the research question; (ii) data acquisition and pre-processing; (iii) model generation; (iv) model validation and performance evaluation; and (v) model presentation and reporting. In accordance with the PROBAST recommendations, the risk of bias during the modelling process was reduced in the following aspects: • In the first case study, three types of biases were taken into account: (i) bias due to missing data was reduced by adopting compatible methods to conduct imputation; (ii) bias due to unmeasured predictors was tested by sensitivity analysis; and (iii) bias due to the initial choice of modelling approach was addressed by comparing tree-based machine learning algorithms (survival tree, random survival forest and conditional inference forest) with the traditional statistical model (Cox regression). • In the second case study, the following strategies were employed: (i) missing data were addressed by multiple imputation with missing indicator methods; (ii) a multilevel logistic regression approach was adopted for model development in order to fit Table of Contents xi the hierarchical structure of the data; (iii) model complexity was reduced using the Least Absolute Shrinkage and Selection Operator (LASSO) for predictor selection; and (iv) the models’ predictive performance was evaluated comprehensively by using the Area Under the Precision Recall Curve (AUPRC) in addition to the Area Under the Receiver Operating Characteristic curve (AUROC); (v) finally, and most importantly, given the existing criticism in the research community concerning the gender-based and racial bias in risk prediction models, we compared the models’ predictive performance built with different sets of predictors (including a clinical set, a sociodemographic set and a combination of both, the ‘general’ set). Results The first and second review studies indicated that, in the field of oral health, the popularity of multivariable prediction models has increased in recent years. Bias and variance are two components of the uncertainty (e.g., the mean squared error) in model estimation. However, the majority of the existing studies did not account for various sources of bias, such as measurement error and inappropriate handling of missing data. Moreover, non-transparent reporting and lack of reproducibility of the models were also identified in the existing oral health prediction modelling studies. These findings provided motivation to conduct two case studies aimed at demonstrating adherence to the contemporary guidelines and to best practice. In the third study, comparable predictive capabilities between Cox regression and the non-parametric tree-based machine learning algorithms were observed for predicting the survival of patients with oral and pharyngeal cancers. For example, the C-index for a Cox model and a random survival forest in predicting three-year survival were 0.82 and 0.84, respectively. A novelty of this study was the development of an online calculator designed to provide an open and transparent estimation of patients’ survival probability for up to five years after diagnosis. This calculator has clinical translational potential and could aid in patient stratification and treatment planning, at least in the context of ongoing research. In addition, the transparent reporting of this study was achieved by following the TRIPOD checklist and sharing all data and codes. In the fourth study, LASSO regression suggested that pre-treatment clinical factors were important in the development of one-week and six-month postoperative pain following root canal treatment. Among all the developed multilevel logistic models, models with a clinical set of predictors yielded similar predictive performance to models with a general set of predictors, while the models with sociodemographic predictors showed the weakest predictive ability. For example, for predicting one-week postoperative pain, the AUROC for models with clinical, sociodemographic and general predictors were 0.82, 0.68 and 0,84, respectively, and the AUPRC were 0.66, 0.40 and 0.72, respectively. Conclusion The significance of this research project is twofold. First, prediction models have been developed for potential clinical use in the context of various oral conditions. Second, this research represents the first attempt to standardise the conduct of this type of studies in oral health research. This thesis presents three conclusions: 1) Adherence to contemporary best practice guidelines such as PROBAST and TRIPOD is limited in the field of oral health research. In response, this PhD project disseminates these guidelines and leverages their advantages to develop effective prediction models for use in dentistry and oral health. 2) Use of appropriate procedures, accounting for and adapting to multiple sources of bias in model development, produces predictive tools of increased reliability and accuracy that hold the potential to be implemented in clinical practice. Therefore, for future prediction modelling research, it is important that data analysts work towards eliminating bias, regardless of the areas in which the models are employed. 3) Machine learning algorithms provide alternatives to traditional statistical models for clinical prediction purposes. Additionally, in the presence of clinical factors, sociodemographic characteristics contribute less to the improvement of models’ predictive performance or to providing cogent explanations of the variance in the models, regardless of the modelling approach. Therefore, it is timely to reconsider the use of sociodemographic characteristics in clinical prediction modelling research. It is suggested that this is a proportionate and evidence based strategy aimed at reducing biases in healthcare risk prediction that may be derived from gender and racial characteristics inherent in sociodemographic data sets.Thesis (Ph.D.) -- University of Adelaide, School of Public Health, 202
Use of radiomic data to improve imputation of HPV (p16) status in oropharyngeal cancer
The incidence of oropharyngeal cancer has been steadily increasing during the past decades. This increase is linked with human papillomavirus, one of the most common sexually transmitted diseases in Canada and worldwide. Recent studies have shown the importance of using p16 testing to assess the HPV status of all oropharyngeal cancer patients on diagnostic. However, that practice was not common during early 2000, making historical data flawed.
Many imputation models have been built to retroactively predict the HPV status of oropharyngeal cancer patients that were not tested. This models are based on clinical data, which is easy to store and analyze. However, recent advancements in the field of radiomics have enabled the use of CT scans obtained from patients to build models for cancer behavior. In this study, we take a novel approach to HPV status imputation by building machine learning models that utilize not only clinical data but also imaging features, aiming to show a significant improvement over classical models. The increase of performance between state of the art clinical models and our models will be assessed through the use of the RADCURE dataset from the Princess MargaretOutgoin
- …