937 research outputs found

    Graph based fusion of high-dimensional gene- and microRNA expression data

    Get PDF
    One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction. Here, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs, correlations between miRNA and mRNA expression data as well as target prediction information were used to build a bipartite graph representing the relations between miRNAs and mRNAs. Feature selection is a critical part when fitting prediction models to high- dimensional data. Most methods treat features, in this case genes or miRNAs, as independent, an assumption that does not hold true when dealing with combined gene and miRNA expression data. To improve prediction accuracy, a description of the correlation structure in the data is needed. In this work the bipartite graph was used to guide the feature selection and therewith improve prediction results and find a stable prognostic signature of miRNAs and genes. The method is evaluated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse, an important event in prostate cancer treatment, was used as clinical endpoint. Biochemical relapse coins the renewed rise of the blood level of a prostate marker (PSA) after surgical removal of the prostate. The relapse is a hint for metastases and usually the point in clinical practise to decide for further treatment. A boosting approach was used to predict the biochemical relapse. It could be shown that the bipartite graph in combination with miRNA and mRNA expression data could improve prediction performance. Furthermore the ap- proach improved the stability of the feature selection and therewith yielded more consistent marker sets. Of course, the marker sets produced by this new method contain mRNAs as well as miRNAs. The new approach was compared to two state-of-the-art methods suited for high-dimensional data and showed better prediction performance in both cases

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    Crater lake cichlids individually specialize along the benthic-limnetic axis

    Get PDF
    A common pattern of adaptive diversification in freshwater fishes is the repeated evolution of elongated open water (limnetic) species and high-bodied shore (benthic) species from generalist ancestors. Studies on phenotype-diet correlations have suggested that population-wide individual specialization occurs at an early evolutionary and ecological stage of divergence and niche partitioning. This variable restricted niche use across individuals can provide the raw material for earliest stages of sympatric divergence. We investigated variation in morphology and diet as well as their correlations along the benthic-limnetic axis in an extremely young Midas cichlid species, Amphilophus tolteca, endemic to the Nicaraguan crater lake Asososca Managua. We found that A. tolteca varied continuously in ecologically relevant traits such as body shape and lower pharyngeal jaw morphology. The correlation of these phenotypes with niche suggested that individuals are specialized along the benthic-limnetic axis. No genetic differentiation within the crater lake was detected based on genotypes from 13 microsatellite loci. Overall, we found that individual specialization in this young crater lake species encompasses the limnetic- as well as the benthic macro-habitat. Yet there is no evidence for any diversification within the species, making this a candidate system for studying what might be the early stages preceding sympatric divergence

    INNOVATIVE BIOSTATISTICAL AND BIOINFORMATIC APPROACHES IN THE ANALYSIS OF BREAST CANCER: COMPETING RISK SURVIVAL ANALYSIS THROUGH PSEUDO-VALUES AND COMPREHENSIVE EVALUATION OF METHODS FOR THE TUMOR MICROENVIRONMENT DISSECTION AVAILABLE AT THE PRESENT DAY.

    Get PDF
    Since my personal original background was quite distant from the statistical bioinformatic approaches for data analysis, having a master degree in Sanitary Biotechnology and Molecular Medicine, my PhD fellowship was spent in building my skills in this field while studying and trying to contribute to the development of biostatistical and bioinformatic approaches to be applied in clinic, with a special focus on oncology, in the optic to contribute to the field of personalized medicine. Personalized medicine is indeed the ultimate goal for life sciences, particularly for oncology, and, in my opinion, a key aspect of the future wellness of humanity. Personalized medicine is the idea of developing the ability to identify the best therapeutic strategy for each unique person and its efficacy relies on having accurate diagnostic tests that identify patients who can benefit from targeted therapies. A striking example consists in the determination of the overexpression of the human epidermal growth factor receptor type 2 (HER2) in the routinely diagnosis of Breast Cancer (BC). HER2 is indeed associated with a worse prognosis but also predicts a better response to the medication trastuzumab; a test for HER2 was approved along with the drug (as a \u201ccompanion diagnostic\u201d) so that clinicians can better target patients' treatment. My thesis is composed by the description of the two projects that have mainly characterized my fellowship. Both projects rely on breast cancer (BC) and the objective of understanding the effects of chronic low inflammation, which has been studied in my projects as the leucocyte infiltration and the body mass index. The focus on BC derives from a practical aspect and an epidemiological aspect. The practical aspect consists on the fact that my group is part of a European research group, led by Christine Desmedt from Belgium, which allowed me to obtain unique data and to interact with experts of BC and bioinformatics from different countries. The epidemiological aspect is represented by the fact that breast cancer is actually a hot topic, being the second most common cancer worldwide and the first among women, but still open to investigations, since the complexity and variability of BC, reflected both at histopathological and molecular level, have proven challenging to classify and therefore to effectively treat to the present day. The first project presented, the tumor microenvironment (TME) dissection project, occupied the first part of my fellowship and was focused on the managing of an enormous quantity of data in order to compare different tools and approaches used to analyze breast cancer. This project consisted in a big European collaboration which tried to establish the reliability of bioinformatic tools in retrieving the TME composition by analyzing bulk transcriptome and methylome and comparing the obtained results to standard approaches, as the pathologist evaluation, and emerging methods, as digital image analysis. This project led to the preparation of a paper, which is currently under submission, under the supervision of Christine Desmedt, the leader of this breast cancer research group, and Elia Biganzoli, my supervisor and member of the cited group. The second project presented, the competing risk analysis through pseudo-values project, which characterized the third year of my PhD, is more focused on the statistical aspects of clinical data analysis and represent the arrival point of my studies of statistical methodology. The project consisted in the exploration of a forefront approach to the analysis of survival data based on pseudo-values, which has the desirable feature to generate measures with a clear and direct interpretation at a clinical level, becoming an invaluable tool for clinical decision making. This project represents a first step in a longer-term project that will led to the preparation of several papers in the future

    Characteristics of white blood cell count in acute lymphoblastic leukemia : A COST LEGEND phenotype-genotype study

    Get PDF
    Background White blood cell count (WBC) as a measure of extramedullary leukemic cell survival is a well-known prognostic factor in acute lymphoblastic leukemia (ALL), but its biology, including impact of host genome variants, is poorly understood. Methods We included patients treated with the Nordic Society of Paediatric Haematology and Oncology (NOPHO) ALL-2008 protocol (N = 2347, 72% were genotyped by Illumina Omni2.5exome-8-Bead chip) aged 1-45 years, diagnosed with B-cell precursor (BCP-) or T-cell ALL (T-ALL) to investigate the variation in WBC. Spline functions of WBC were fitted correcting for association with age across ALL subgroups of immunophenotypes and karyotypes. The residuals between spline WBC and actual WBC were used to identify WBC-associated germline genetic variants in a genome-wide association study (GWAS) while adjusting for age and ALL subtype associations. Results We observed an overall inverse correlation between age and WBC, which was stronger for the selected patient subgroups of immunophenotype and karyotypes (rho(BCP-ALL )= -.17, rho(T-ALL )= -.19; p < 3 x 10(-4)). Spline functions fitted to age, immunophenotype, and karyotype explained WBC variation better than age alone (rho = .43, p << 2 x 10(-6)). However, when the spline-adjusted WBC residuals were used as phenotype, no GWAS significant associations were found. Based on available annotation, the top 50 genetic variants suggested effects on signal transduction, translation initiation, cell development, and proliferation. Conclusion These results indicate that host genome variants do not strongly influence WBC across ALL subsets, and future studies of why some patients are more prone to hyperleukocytosis should be performed within specific ALL subsets that apply more complex analyses to capture potential germline variant interactions and impact on WBC.Peer reviewe

    Biological-based models of carcinogenesis in the lung from radiation and smoking

    Get PDF
    Lung adenocarcinoma and squamous cell carcinoma are the deadliest cancers worldwide. Smoking and ionizing radiation are potent carcinogens affecting strongly both lung cancer subtypes. Several biological analyses have been performed to characterise the genetic mutations leading to adenocarcinoma and squamous cell carcinoma, and different genomic spectra have been observed. Biological markers of smoking related damage could be found, leading to a deep knowledge of cellular smoking effects. Less is known about the biological effects of radiation in human carcinogenesis. Risks have been quantified with epidemiological studies of these carcinogens. Based on the biologically substantiated assumption that the number of mutations is linearly related to the dose, in radiation epidemiology it is standard to model effects linearly. These models do however not have a biological interpretation and are disconnected from general statistical methods. Here we fill both gaps. First we apply statistical generalised additive models to examine the functional relation between risk and smoking and radiation effects. Secondly, with mechanistic multi-scale models we integrate molecular biology and epidemiology to describe the carcinogenesis of lung adenocarcinoma and squamous cell carcinoma. To investigate the incidence of lung adenocarcinoma and lung squamous cell carcinoma we analysed two cohorts: first the Life Span Study cohort of atomic bomb survivors of Hiroshima and Nagasaki, and second the Eldorado cohort of Canadian Uranium miners. Exposures differed strongly between cohorts. Residents of Hiroshima and Nagasaki were exposed to a relative high dose of gamma radiation for a short time, while the miners were exposed to a protracted and lower exposure to alpha and gamma radiation. Information about smoking habits is available only for the former cohort. Three types of models were applied to analyse the effects of radiation and smoking: state-of-the-art statistical risk models of radiation protection, statistical generalized additive models and mechanistic risk models. Although there were quantitative differences in effect size and significance, each result is presented below only for a single model. For lung adenocarcinoma the best mechanistic model was a two pathway model. Smoking and radiation effects showed markedly different patterns: both acted on the apoptosis rate of precancerous cells but on different pathways without any interaction. A linear radiation effect was found in one pathway and a linear-exponential smoking effect in the other pathway. Independently of these results we analysed genomic data of American patients. It is known that the genetic damage of people with adenocarcinoma can be grouped into three pathways: the receptor mutant (RMUT ) pathway, the transducer mutant pathway (TMUT ), and other signatures (OWT ). We could show that signatures of TMUT and the OWT pathways do differ much less from each other than both differed to the RMUT pathway. Therefore, there is also genetic evidence that adenocarcinoma fall into two main classes. The two pathways of the mechanistic model could be associated to the RMUT and RMUT+OWT pathways by their risk patterns in age and smoking. On the other hand, for squamous cell carcinoma one pathway was sufficient to describe the incidence data. Although effects of radiation appeared to be highly significant, they could be traced back to arise only from the first five years of follow up (33 cases therein). When the first five years were excluded, no significant radiation effect could be found. Interestingly, for lung squamous cell carcinoma the mechanistic models could fit the effects of cigarette smoking in initiation and promotion. This was different for lung adenocarcinoma, where the main effect of smoking was a promotion of already existing pre-cancerous clones. For both, lung adenocarcinoma and squamous cell carcinoma, no interaction between radiation and smoking could be fitted for the Life Span Study cohort. Results from analysis of the Eldorado cohort were in line with the results presented above. For lung adenocarcinoma both, the state-of-the-art statistical risk models and the generalised additive models, could find only a significant effect of radiation exposure. For lung squamous cell carcinoma, vice versa, both models could find only a significant effect of gamma radiation exposure. Concluding, we showed that lung cancer cannot be investigated as a single endpoint but the different subtypes have to be analysed separately. Different radiation qualities act differently to the different subtypes, indicating different biological processes. Analogously, although smoking is an important risk factor for all subtypes, its effects were different and with different magnitudes

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine
    corecore