    Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information

    Single nucleotide polymorphism (SNP) microarray data. SNP data underlying the finding in this article. (Rdata 50688 kb

    The immune microenvironment in mantle cell lymphoma : Targeted liquid and spatial proteomic analyses

    The complex interplay of the tumour and immune cells affects tumour growth, progression, and response to treatment. Restorationof effective immune response forms the basis of onco-immunology, which further enabled the development of immunotherapy. Inthe era of precision medicine, pin-pointing patient biological heterogeneity especially in relation to patient-specific immunemicroenvironment is a necessity for the discovery of novel biomarkers and for development of patient stratification tools for targetedtherapeutics. Mantle cell lymphoma (MCL) is a rare and aggressive subtype of B-cell lymphoma with poor survival and high relapserates. Previous investigations of MCL have largely focused on the tumour itself and explorations of the immune microenvironmenthave been limited. This thesis and the included five papers, investigates multiple aspects of the immune microenvironment withrespect to proteomic analysis performed on tissue and liquid biopsies of diagnostic and relapsed/refractory (R/R) MCL cohorts.Analyses based on liquid biopsies (serum) in particular are relevant for aggressive cases such as in relapse, where invasiveprocedures for extracting tissues is not recommended. Thus, paper I-II probes the possibility of using serum for treatment andoutcome-associated biomarker discovery in R/R MCL, using a targeted affinity-based protein microarray platform quantifyingimmune-regulatory and tumor-secretory proteins in sera. Analysis performed in paper I using pre-treatment samples, identifies 11-plex biomarker signature (RIS – relapsed immune signature) associated with overall survival. Further integration of RIS with mantlecell lymphoma international prognostic index (MIPI) led to the development of MIPIris index for the stratification of R/R MCL intothree risk groups. Moreover, longitudinal analysis can be important in understanding how patient respond to treatment and thiscan further guide therapeutic interventions. Thus, paper II is a follow-up study wherein longitudinal analyses was performed onpaired samples collected at pre-treatment (baseline) and after three months of chemo-immunotherapy (on-treatment). We showhow genetic aberrations can influence systemic profiles and thus integrating genetic information can be crucial for treatmentselection. Furthermore, we observe that the inter-patient heterogeneity associated with absolute values can be circumvented byusing velocity of change to capture general changes over time in groups of patients. Thus, using velocity of change in serumproteins between pre- and on-treatment samples identified response biomarkers associated with minimal residual disease andprogression. While exploratory analysis using high dimensional omics-based data can be important for accelerating discovery,translating such information for clinical utility is a necessity. Thus, in paper III, we show how serum quantification can be usedcomplementary tissue-identified prognostic biomarkers and this can enable faster clinical implementation. Presence of CD163+M2-like macrophages has shown to be associated with poor outcome in MCL tissues. We show that higher expression of sCD163levels in sera quantified using ELISA, is also associated with poor outcome in diagnostic and relapsed MCL. Furthermore, wesuggest a cut-off for sCD163 levels that can be used for clinical utility. Further exploration of the dynamic interplay of tumourimmunemicroenvironment is now possible using spatial resolved omics for tissue-based analysis. Thus, in paper IV and V, weanalyse cell-type specific proteomic data collected from tumour and immune cells using GeoMx™ digital spatial profiler. In paperIV, we show that presence as well as spatial localization of CD163+ macrophage with respect to tumour regions impactsmacrophage phenotypic profiles. Further modulation in the profile of surrounding tumour and T-cells is observed whenmacrophages are present in the vicinity. Based on this analysis, we suggest MAPK pathway as a potential therapeutic target intumours with CD163+ macrophages. Immune composition can be defined not just by the type of cells, but also with respect tofrequency and spatial localization and this is explored in paper V with respect to T-cell subtypes. Thus, in paper V, we optimizeda workflow of multiplexed immunofluorescence image segmentation that allowed us to extract cell metrics for four subtypes ofCD3+ T-cells. Using this data, we show that higher infiltration of T-cells is associated with a positive outcome in MCL. Moreover,by combining image derived metrics to cell specific spatial omics data, we were able to identify immunosuppressivemicroenvironment associated with highly infiltrated tumours and suggests new potential targets of immunotherapy with respect toIDO1, GITR and STING. In conclusion, this thesis explores systemic and tumor-associated immune microenvironment in MCL, fordefining patient heterogeneity, developing methods of patient stratification and for identifying novel and actionable biomarkers

    Integrative methods for analyzing big data in precision medicine

    We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of “Big Data” in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face

    Integrative methods for analysing big data in precision medicine

    De novo pathway-based biomarker identification

    Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for the patients suffering from complex diseases, such as cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts to mitigate these drawbacks have led to the development of network-based approaches that integrate pathway information to produce meta-gene (MG) features. Also, MG approaches have only dealt with the two-class problem of good versus poor outcome prediction. Stratifying patients based on their molecular subtypes can provide a detailed view of the disease and lead to more personalized therapies. We propose and discuss a novel MG approach based on de novo pathways, which for the first time have been used as features in a multi-class setting to predict cancer subtypes. Comprehensive evaluation in a large cohort of breast cancer samples from The Cancer Genome Atlas (TCGA) revealed that MGs are considerably more stable than SG models, while also providing valuable insight into the cancer hallmarks that drive them. In addition, when tested on an independent benchmark non-TCGA dataset, MG features consistently outperformed SG models. We provide an easy-touse web service at http:// pathclass. compbio. sdu. dk where users can upload their own gene expression datasets from breast cancer studies and obtain the subtype predictions from all the classifiers

    A Path to Implement Precision Child Health Cardiovascular Medicine.

    Congenital heart defects (CHDs) affect approximately 1% of live births and are a major source of childhood morbidity and mortality even in countries with advanced healthcare systems. Along with phenotypic heterogeneity, the underlying etiology of CHDs is multifactorial, involving genetic, epigenetic, and/or environmental contributors. Clear dissection of the underlying mechanism is a powerful step to establish individualized therapies. However, the majority of CHDs are yet to be clearly diagnosed for the underlying genetic and environmental factors, and even less with effective therapies. Although the survival rate for CHDs is steadily improving, there is still a significant unmet need for refining diagnostic precision and establishing targeted therapies to optimize life quality and to minimize future complications. In particular, proper identification of disease associated genetic variants in humans has been challenging, and this greatly impedes our ability to delineate gene-environment interactions that contribute to the pathogenesis of CHDs. Implementing a systematic multileveled approach can establish a continuum from phenotypic characterization in the clinic to molecular dissection using combined next-generation sequencing platforms and validation studies in suitable models at the bench. Key elements necessary to advance the field are: first, proper delineation of the phenotypic spectrum of CHDs; second, defining the molecular genotype/phenotype by combining whole-exome sequencing and transcriptome analysis; third, integration of phenotypic, genotypic, and molecular datasets to identify molecular network contributing to CHDs; fourth, generation of relevant disease models and multileveled experimental investigations. In order to achieve all these goals, access to high-quality biological specimens from well-defined patient cohorts is a crucial step. Therefore, establishing a CHD BioCore is an essential infrastructure and a critical step on the path toward precision child health cardiovascular medicine

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Mapping the genetic architecture of gene expression in human liver

    Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process. © 2008 Schadt et al