25 research outputs found

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Simultaneous Ozone and High Light Treatments Reveal an Important Role for the Chloroplast in Co-ordination of Defense Signaling

    Get PDF
    Plants live in a world of changing environments, where they are continuously challenged by alternating biotic and abiotic stresses. To transfer information from the environment to appropriate protective responses, plants use many different signaling molecules and pathways. Reactive oxygen species (ROS) are critical signaling molecules in the regulation of plant stress responses, both inside and between cells. In natural environments, plants can experience multiple stresses simultaneously. Laboratory studies on stress interaction and crosstalk at regulation of gene expression, imply that plant responses to multiple stresses are distinctly different from single treatments. We analyzed the expression of selected marker genes and reassessed publicly available datasets to find signaling pathways regulated by ozone, which produces apoplastic ROS, and high light treatment, which produces chloroplastic ROS. Genes related to cell death regulation were differentially regulated by ozone versus high light. In a combined ozone + high light treatment, the light treatment enhanced ozone-induced cell death in leaves. The distinct responses from ozone versus high light treatments show that plants can activate stress signaling pathways in a highly precise manner.Peer reviewe

    Comparison of methods to detect differentially expressed genes between single-cell populations

    Get PDF
    We compared five statistical methods to detect differentially expressed genes between two distinct single-cell populations. Currently, it remains unclear whether differential expression methods developed originally for conventional bulk RNA-seq data can also be applied to single-cell RNA-seq data analysis. Our results in three diverse comparison settings showed marked differences between the different methods in terms of the number of detections as well as their sensitivity and specificity. They, however, did not reveal systematic benefits of the currently available single-cell-specific methods. Instead, our previously introduced reproducibility-optimization method showed good performance in all comparison settings without any single-cell-specific modifications.</p

    Simultaneous Ozone and High Light Treatments Reveal an Important Role for the Chloroplast in Co-ordination of Defense Signaling

    Get PDF
    Plants live in a world of changing environments, where they are continuously challenged by alternating biotic and abiotic stresses. To transfer information from the environment to appropriate protective responses, plants use many different signaling molecules and pathways. Reactive oxygen species (ROS) are critical signaling molecules in the regulation of plant stress responses, both inside and between cells. In natural environments, plants can experience multiple stresses simultaneously. Laboratory studies on stress interaction and crosstalk at regulation of gene expression, imply that plant responses to multiple stresses are distinctly different from single treatments. We analyzed the expression of selected marker genes and reassessed publicly available datasets to find signaling pathways regulated by ozone, which produces apoplastic ROS, and high light treatment, which produces chloroplastic ROS. Genes related to cell death regulation were differentially regulated by ozone versus high light. In a combined ozone + high light treatment, the light treatment enhanced ozone-induced cell death in leaves. The distinct responses from ozone versus high light treatments show that plants can activate stress signaling pathways in a highly precise manner

    Reproducibility-optimized detection of differential DNA methylation

    Get PDF
    Compared with state-of-the-art methods, ROTS shows competitive sensitivity and specificity in detecting consistently differentially methylated regions

    ROTS: An R package for reproducibility-optimized statistical testing

    Get PDF
    Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS)

    A predictive model of overall survival in patients with metastatic castration-resistant prostate cancer

    Get PDF
    Metastatic castration resistant prostate cancer (mCRPC) is one of the most common cancers with a poor prognosis. To improve prognostic models of mCRPC, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) Consortium organized a crowdsourced competition known as the Prostate Cancer DREAM Challenge. In the competition, data from four phase III clinical trials were utilized. A total of 1600 patients’ clinical information across three of the trials was used to generate prognostic models, whereas one of the datasets (313 patients) was held out for blinded validation. As a performance baseline, a model presented in a recent study (so called Halabi model) was used to assess improvements of the new models. This paper presents the model developed by the team TYTDreamChallenge to predict survival risk scores for mCRPC patients at 12, 18, 24 and 30-months after trial enrollment based on clinical features of each patient, as well as an improvement of the model developed after the challenge. The TYTDreamChallenge model performed similarly as the gold-standard Halabi model, whereas the post-challenge model showed markedly improved performance. Accordingly, a main observation in this challenge was that the definition of the clinical features used plays a major role and replacing our original larger set of features with a small subset for training increased the performance in terms of integrated area under the ROC curve from 0.748 to 0.779.</p

    Hypoxia-inducible factor (HIF)-prolyl hydroxylase 3 (PHD3) maintains high HIF2A mRNA levels in clear cell renal cell carcinoma

    Get PDF
    Most clear cell renal cell carcinomas (ccRCCs) have inactivation of the von Hippel-Lindau tumor suppressor protein (pVHL), resulting in the accumulation of hypoxia-inducible factor -subunits (HIF-) and their downstream targets. HIF-2 expression is particularly high in ccRCC and is associated with increased ccRCC growth and aggressiveness. In the canonical HIF signaling pathway, HIF-prolyl hydroxylase 3 (PHD3) suppresses HIF-2 protein by post-translational hydroxylation under sufficient oxygen availability. Here, using immunoblotting and immunofluorescence staining, qRT-PCR, and siRNA-mediated gene silencing, we show that unlike in the canonical pathway, PHD3 silencing in ccRCC cells leads to down-regulation of HIF-2 protein and mRNA. Depletion of other PHD family members had no effect on HIF-2 expression, and PHD3 knockdown in non-RCC cells resulted in the expected increase in HIF-2 protein expression. Accordingly, PHD3 knockdown decreased HIF-2 target gene expression in ccRCC cells and expression was restored upon forced HIF-2 expression. The effect of PHD3 depletion was pinpointed to HIF2A mRNA stability. In line with these in vitro results, a strong positive correlation of PHD3 and HIF2A mRNA expression in ccRCC tumors was detected. Our results suggest that in contrast to the known negative regulation of HIF-2 in most cell types, high PHD3 expression in ccRCC cells maintains elevated HIF-2 expression and that of its target genes, which may enhance kidney cancer aggressiveness

    Longitudinal modeling of ultrasensitive and traditional prostate-specific antigen and prediction of biochemical recurrence after radical prostatectomy

    Get PDF
    Ultrasensitive prostate-specific antigen (u-PSA) remains controversial for follow-up after radical prostatectomy (RP). The aim of this study was to model PSA doubling times (PSADT) for predicting biochemical recurrence (BCR) and to capture possible discrepancies between u-PSA and traditional PSA (t-PSA) by utilizing advanced statistical modeling. 555 RP patients without neoadjuvant/adjuvant androgen deprivation from the Turku University Hospital were included in the study. BCR was defined as two consecutive PSA values > 0.2 ng/mL and the PSA measurements were log(2)-transformed. One third of the data was reserved for independent validation. Models were first fitted to the post-surgery PSA measurements using cross-validation. Major trends were then captured using linear mixed-effect models and a predictive generalized linear model effectively identified early trends connected to BCR. The model generalized for BCR prediction to the validation set with ROC-AUC of 83.6% and 95.1% for the 1 and 3 year follow-up censoring, respectively. A web-based tool was developed to facilitate its use. Longitudinal trends of u-PSA did not display major discrepancies from those of t-PSA. The results support that u-PSA provides useful information for predicting BCR after RP. This can be beneficial to avoid unnecessary adjuvant treatments or to start them earlier for selected patients.Peer reviewe
    corecore