9,372 research outputs found

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    SWIM: A computational tool to unveiling crucial nodes in complex biological networks

    Get PDF
    SWItchMiner (SWIM) is a wizard-like software implementation of a procedure, previously described, able to extract information contained in complex networks. Specifically, SWIM allows unearthing the existence of a new class of hubs, called "fight-club hubs", characterized by a marked negative correlation with their first nearest neighbors. Among them, a special subset of genes, called "switch genes", appears to be characterized by an unusual pattern of intra- and inter-module connections that confers them a crucial topological role, interestingly mirrored by the evidence of their clinic-biological relevance. Here, we applied SWIM to a large panel of cancer datasets from The Cancer Genome Atlas, in order to highlight switch genes that could be critically associated with the drastic changes in the physiological state of cells or tissues induced by the cancer development. We discovered that switch genes are found in all cancers we studied and they encompass protein coding genes and non-coding RNAs, recovering many known key cancer players but also many new potential biomarkers not yet characterized in cancer context. Furthermore, SWIM is amenable to detect switch genes in different organisms and cell conditions, with the potential to uncover important players in biologically relevant scenarios, including but not limited to human cancer

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Secretory to multi-ciliated cell imbalance by altered cellular progeny in end-stage COPD facilitates resilience to environmental pollutants

    Get PDF
    Background: Air pollution is a major risk factor for patients suffering from chronic res-piratory conditions including chronic obstructive pulmonary disease (COPD), as it drives episodes of exacerbation and subsequently disease progression. Analysis of the differentiation process of the primary human bronchial epithelial cells (pHBECs) from non-chronic lung disease (non-CLD) and CLD/ COPD-diseased tissue samples is of critical importance to understand the underlying pathophysiological mech-anisms that characterize the disease specific response to air pollutant exposure at the first line of defense, i.e. the human bronchial epithelium. In addition, pHBECs culture con-tributes to potential identification of preventive and therapeutic strategies in CLD. Materials and Methods: We established 3D air-liquid interface (ALI) cultures in pHBECs isolated from large airway resections of diseased (n=3 COPD-II and n=6 COPD-IV) and healthy (non-CLD, n=4) patients. To mimic air pollution, pHBECs were exposed to relevant aerosolized nanoparticles (NPs, i.e. carbon black soot surrogate NP (CNP) and Zinc oxide (ZnO)) using the pre-clinical, highly standardized VITROCELL® CLOUD 12 nebulization system (Waldkirch, Germany). ALI cultures, validated for their disease specific, biomimetic cellular composition using trans-mural bronchial punches (BP), were analyzed for functional consequences of NP exposure via transepi-thelial electrical resistance (TEER), WST-1, LDH, 3D confocal immunofluorescence (IF), transcriptome, secretome as well as ciliary beating frequencies (CBF) of multi-ciliated cells (MCC). To highlight the cell differentiation trajectory that explains the outlined cell composition and functional changes postexposure, single cell RNA-seq drop-seq analysis and immunofluorescence (IF) stainings of native bronchial tissue sam-ples were performed. Results: ZnO exposure induced effects on the amount of MCC and function exceeded the effects observed by CNP or LPS exposure. Exposure to moderate ZnO doses in-duced a decrease in the number of MCC in COPD-II (20.35±14.07%) and COPD-IV pHBECs (18.51±11.86%) when compared to non-CLD cells (47.01±2.80%), as well as an elevated number of MUC5AC+ cells in COPD-IV cultures (12.75±2.90%) when compared to non-CLD cultures (5.17±2.43%). These findings were accompanied by a concentration dependent reduction in epithelial barrier integrity (TEER), metabolic cell viability (WST-1) and membrane integrity (LDH release) in non-CLD and COPD-II pHBECs when compared to COPD-IV pHBECs. Following ZnO and CNP exposure, COPD-IV cultures were characterized by transcriptional regulation of genes involved in secretory cells (SC)-MCC differentiation axis (cilium assembly and organization), TLR-mediated innate immunity and regulation of extracellular matrix remodeling (ECM). Also at baseline level, transcriptome analysis revealed an overrepresentation of ECM gene clusters in COPD-IV cultures. Cellular composition of pHBECs ALI cultures resembled the ex vivo picture achieved by culturing BPs at ALI. These findings highlight an oligo-ciliated hypersecretory phe-notype in COPD-IV cultures, with a skewed basal cells (BC)-induced cell trajectory towards SC at the expense of the more vulnerable MCC. Terminal differentiation into MCC resulted from progenitor SC (MUC5AC+, CC10+). The outlined phenotype was in line with an aberrant expression of MCC genes together with a pathologic CBF spectrum in COPD-IV cultures. Drop-seq single cell RNA-seq analysis in both non-CLD and COPD-IV cultures on ALI day 0 and ALI day 28 revealed two distinct BC populations (basal_1, basal_2) as pro-genitor cells for the SC-MCC differentiation axis. BC present a strong shift between non-CLD and COPD-IV patients on both ALI day 0 and day 28. Specifically, basal_1 cells characterized the COPD-IV cultures, being predominantly detected on ALI day 0 and exclusively on ALI day 28 in COPD-IV derived pHBECs. Conversely, basal_2 cells characterized the non-CLD derived pHBECs on both ALI day 0 and 28. These signatures were validated by IF stainings of native bronchial tissue samples. Conclusion. In summary, our results identify the predominance of SC in the large air-ways of patients suffering from COPD-IV resulting in a greater functional resilience of the pHBECs to environmental small particle exposure underlined by the unsuccessful drive to induce trans-differentiation of the SC cells into MCC.Hintergrund: Luftverschmutzung ist der wichtigste Risikofaktor für Patienten mit chronischen obstruktiven Lungenerkrankung (COPD). Die chronische Nanopartikel (NP) Exposition führt zur wiederholten Exazerbationen und zu einem unvermeidbaren Fortschreiten der COPD Erkrankung. Die Analyse des Differenzierungsprozesses der primären humanen Bronchialepithelzel-len (pHBECs) von Patienten ohne chronische Lungenerkrankungen (non-chronic lung disease/ non-CLD) und mit chronischen Lungenerkrankungen (CLD/ e.g. COPD) ist von entscheidender Bedeutung, die pathophysiologischen Mechanismen des Bronchial-epithels nach Luftverschmutzung zu charakterisieren sowie präventive und therapeuti-sche Strategien für CLD zu etablieren. Materialen und Methoden: Wir haben eine 3D Zellkultur an der Luft-Flüssigkeit-Grenzschicht (air liquid interface/ ALI) mit pHBECs aus den proximalen Hauptbron-chien von COPD (n = 3 COPD-II und n = 6 COPD-IV) und gesunden (n = 4 non-CLD) Patienten etabliert. Um die Umweltverschmutzung in vitro zu simulieren, wurde eine Exposition der pHBECs mit relevanten aerosolierten NP (z.B. Carbon soot surrogate NP (CNP) und Zinkoxid (ZnO)) mit Hilfe eines standardisierten Expositionssystems (VIT-ROCELL® CLOUD 12, Waldkirch, Germany) durchgeführt. Die Validierung von ALI-Zellkulturen und deren krankheitsspezifischen und biomimetischen Zellzusammenset-zung erfolgte mit Hilfe eines neuen 3D Kulturmodells unter Verwendung von frischen nativen humanen Bronchialwand-Präparaten (bronchial punches/ BPs), die die intakte Struktur der gesamten Bronchialwand aufrechterhalten. Für die Validierung der 3D ALI pHBECs Kulturen erfolgte die Analyse des transepithelialen elektrischen Widerstands (TEER), der Zellviabilität (WST-1) und der Membranintegrität (LDH), sowie die 3D konfokale Immunfluoreszenz (IF), Transkriptom- , Sekretomanalyse, und die Analyse der Zilienschlagfrequenz (ciliary beating frequency/ CBF). Um die Änderungen der Zellzusammensetzung und der Epithelfunktion nach NP Exposition genauer zu analysie-ren, wurden eine single cell RNA-seq drop-seq Analyse und IF von nativen Bronchus-gewebeproben ergänzend durchgeführt. Ergebnisse: Die ZnO-induzierte Effekte auf die Zilien-tragenden Zellen (multi-ciliated cells/ MCC) übertrafen die bei CNP- oder LPS-Exposition beobachteten Effekte. Die Exposition mit moderaten ZnO NP Dosierungen führte zu einer leichten Zahlabnahme der MCC in COPD-II (20,35 ± 14,07%) und COPD-IV pHBECs (18,51 ± 11,86%) im Vergleich zu non-CLD Zellen (47,01 ± 2,80%), sowie zu einer erhöhten Zahl von MUC5AC+ Zellen in COPD-IV Kulturen (12,75 ± 2,90%) im Vergleich zu non-CLD Zellen (5,17 ± 2,43%). Diese Befunde korrelieren mit einem konzentrationsabhängigen Verlust des transepithelialen elektrischen Widerstands (TEER), der Zellviabilität (WST-1) und der Membranintegrität (LDH-Freisetzung) in non-CLD und COPD-II im Vergleich zu COPD-IV pHBECs. Die ZnO und CNP Exposition führte zu einer aberranten Überexpression von Zilien spe-zifischen Genen in den COPD-IV Kulturen. Die Transkriptomanalyse der COPD-IV Kulturen nach ZnO und CNP Exposition zeigte eine Aktivierung der Mukuszellen-MCC Differenzierungsachse, der TLR-Immunität und der extrazellulären Matrix Biosyntese (extracellular matrix remodeling/ ECM). Des Weiteren bestätigen die unbehandelten COPD-IV Kulturen eine signifikante ECM Genexpression. Darüber hinaus ähnelte die Zellzusammensetzung der pHBEC-ALI-Kulturen dem ex vivo Bild der BPs. Diese Ergebnisse demonstrieren einen „oligo-ciliated“ hypersekretori-schen Phänotyp der COPD-IV ALI Kulturen im Rahmen einer verzerrten, Basalzellen-induzierten Zelldifferenzierung in Richtung sekretorischen Zellen, auf Kosten der anfäl-ligen MCC. Die sekretorischen Zellen (MUC5AC+, CC10+) sind Vorläuferzellen für MCC. Dieser Phänotyp stimmte mit einer aberranten MCC Genexpression sowie einem pathologischen CBF Spektrum in COPD-IV Kulturen überein. Die drop-seq single cell Analyse bei non-CLD und COPD-IV Patienten zeigt zwei un-terschiedliche Basalzellpopulationen (basal_1, basal_2) am ALI Tag 0 und 28. Die ba-sal_1 Zellen sind überwiegend am Tag 0 und ausschließlich am Tag 28 in COPD-IV Kulturen identifizierbar. Die basal_2 Zellen charakterisieren die non-CLD-Kulturen so-wohl am Tag 0 als auch am Tag 28. Die single cell Signaturen wurden durch IF Färbun-gen validiert. Schlussfolgerung: Der hypersekretorische Phänotyp der COPD-IV ALI Kulturen führt zu einer größeren funktionellen Widerstandsfähigkeit des Bronchialepithels nach NP Exposition und resultiert aus einer erfolglosen Transdifferenzierung der sekretorischen Zellen in MCC

    Subtype prediction in pediatric acute myeloid leukemia: Classification using differential network rank conservation revisited

    Get PDF
    Background: One of the most important application spectrums of transcriptomic data is cancer phenotype classification. Many characteristics of transcriptomic data, such as redundant features and technical artifacts, make over-fitting commonplace. Promising classification results often fail to generalize across datasets with different sources, platforms, or preprocessing. Recently a novel differential network rank conservation (DIRAC) algorithm to characterize cancer phenotypes using transcriptomic data. DIRAC is a member of a family of algorithms that have shown useful for disease classification based on the relative expression of genes. Combining the robustness of this family's simple decision rules with known biological relationships, this systems approach identifies interpretable, yet highly discriminate networks. While DIRAC has been briefly employed for several classification problems in the original paper, the potentials of DIRAC in cancer phenotype classification, and especially robustness against artifacts in transcriptomic data have not been fully characterized yet. Results: In this study we thoroughly investigate the potentials of DIRAC by applying it to multiple datasets, and examine the variations in classification performances when datasets are (i) treated and untreated for batch effect; (ii) preprocessed with different techniques. We also propose the first DIRAC-based classifier to integrate multiple networks. We show that the DIRAC-based classifier is very robust in the examined scenarios. To our surprise, the trained DIRAC-based classifier even translated well to a dataset with different biological characteristics in the presence of substantial batch effects that, as shown here, plagued the standard expression value based classifier. In addition, the DIRAC-based classifier, because of the integrated biological information, also suggests pathways to target in specific subtypes, which may enhance the establishment of personalized therapy in diseases such as pediatric AML. In order to better comprehend the prediction power of the DIRAC-based classifier in general, we also performed classifications using publicly available datasets from breast and lung cancer. Furthermore, multiple well-known classification algorithms were utilized to create an ideal test bed for comparing the DIRAC-based classifier with the standard gene expression value based classifier. We observed that the DIRAC-based classifier greatly outperforms its rival. Conclusions: Based on our experiments with multiple datasets, we propose that DIRAC is a promising solution to the lack of generalizability in classification efforts that uses transcriptomic data. We believe that superior performances presented in this study may motivate other to initiate a new aline of research to explore the untapped power of DIRAC in a broad range of cancer types

    Transcriptome Analyses of Tumor-Adjacent Somatic Tissues Reveal Genes Co-Expressed with Transposable Elements

    Get PDF
    Background: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. Results: Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. Conclusions: We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes

    Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets.</p> <p>Results</p> <p>Use of a model-based clustering algorithm accurately classified more than 400 microarray samples into 19 different tissue types on the basis of bimodal gene expression. Bimodal expression patterns were also highly effective in differentiating between infectious diseases in model-based clustering of microarray data. Supervised classification with feature selection restricted to switch-like genes also recognized tissue specific and infectious disease specific signatures in independent test datasets reserved for validation. Determination of "on" and "off" states of switch-like genes in various tissues and diseases allowed for the identification of activated/deactivated pathways. Activated switch-like genes in neural, skeletal muscle and cardiac muscle tissue tend to have tissue-specific roles. A majority of activated genes in infectious disease are involved in processes related to the immune response.</p> <p>Conclusion</p> <p>Switch-like bimodal gene sets capture genome-wide signatures from microarray data in health and infectious disease. A subset of bimodal genes coding for extracellular and membrane proteins are associated with tissue specificity, indicating a potential role for them as biomarkers provided that expression is altered in the onset of disease. Furthermore, we provide evidence that bimodal genes are involved in temporally and spatially active mechanisms including tissue-specific functions and response of the immune system to invading pathogens.</p

    Recurrent patterns of DNA copy number alterations in tumors reflect metabolic selection pressures.

    Get PDF
    Copy number alteration (CNA) profiling of human tumors has revealed recurrent patterns of DNA amplifications and deletions across diverse cancer types. These patterns are suggestive of conserved selection pressures during tumor evolution but cannot be fully explained by known oncogenes and tumor suppressor genes. Using a pan-cancer analysis of CNA data from patient tumors and experimental systems, here we show that principal component analysis-defined CNA signatures are predictive of glycolytic phenotypes, including 18F-fluorodeoxy-glucose (FDG) avidity of patient tumors, and increased proliferation. The primary CNA signature is enriched for p53 mutations and is associated with glycolysis through coordinate amplification of glycolytic genes and other cancer-linked metabolic enzymes. A pan-cancer and cross-species comparison of CNAs highlighted 26 consistently altered DNA regions, containing 11 enzymes in the glycolysis pathway in addition to known cancer-driving genes. Furthermore, exogenous expression of hexokinase and enolase enzymes in an experimental immortalization system altered the subsequent copy number status of the corresponding endogenous loci, supporting the hypothesis that these metabolic genes act as drivers within the conserved CNA amplification regions. Taken together, these results demonstrate that metabolic stress acts as a selective pressure underlying the recurrent CNAs observed in human tumors, and further cast genomic instability as an enabling event in tumorigenesis and metabolic evolution
    • …
    corecore