2,089 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review
A variety of genome-wide profiling techniques are available to probe
complementary aspects of genome structure and function. Integrative analysis of
heterogeneous data sources can reveal higher-level interactions that cannot be
detected based on individual observations. A standard integration task in
cancer studies is to identify altered genomic regions that induce changes in
the expression of the associated genes based on joint analysis of genome-wide
gene expression and copy number profiling measurements. In this review, we
provide a comparison among various modeling procedures for integrating
genome-wide profiling data of gene copy number and transcriptional alterations
and highlight common approaches to genomic data integration. A transparent
benchmarking procedure is introduced to quantitatively compare the cancer gene
prioritization performance of the alternative methods. The benchmarking
algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin
Unsupervised multiple kernel learning approaches for integrating molecular cancer patient data
Cancer is the second leading cause of death worldwide. A characteristic of this disease is its complexity leading to a wide variety of genetic and molecular aberrations in the tumors. This heterogeneity necessitates personalized therapies for the patients. However, currently defined cancer subtypes used in clinical practice for treatment decision-making are based on relatively few selected markers and thus provide only a coarse classifcation of tumors. The increased availability in multi-omics data measured for cancer patients now offers the possibility of defining more informed cancer subtypes. Such a more fine-grained characterization of cancer subtypes harbors the potential of substantially expanding treatment options in personalized cancer therapy. In this thesis, we identify comprehensive cancer subtypes using multidimensional data. For this purpose, we apply and extend unsupervised multiple kernel learning methods. Three challenges of unsupervised multiple kernel learning are addressed: robustness, applicability, and interpretability. First, we show that regularization of the multiple kernel graph embedding framework, which enables the implementation of dimensionality reduction techniques, can increase the stability of the resulting patient subgroups. This improvement is especially beneficial for data sets with a small number of samples. Second, we adapt the objective function of kernel principal component analysis to enable the application of multiple kernel learning in combination with this widely used dimensionality reduction technique. Third, we improve the interpretability of kernel learning procedures by performing feature clustering prior to integrating the data via multiple kernel learning. On the basis of these clusters, we derive a score indicating the impact of a feature cluster on a patient cluster, thereby facilitating further analysis of the cluster-specific biological properties. All three procedures are successfully tested on real-world cancer data. Comparing our newly derived methodologies to established methods provides evidence that our work offers novel and beneficial ways of identifying patient subgroups and gaining insights into medically relevant characteristics of cancer subtypes.Krebs ist eine der hĂ€ufigsten Todesursachen weltweit. Krebs ist gekennzeichnet durch seine KomplexitĂ€t, die zu vielen verschiedenen genetischen und molekularen Aberrationen im Tumor fĂŒhrt. Die Unterschiede zwischen Tumoren erfordern personalisierte Therapien fĂŒr die einzelnen Patienten. Die Krebssubtypen, die derzeit zur Behandlungsplanung in der klinischen Praxis verwendet werden, basieren auf relativ wenigen, genetischen oder molekularen Markern und können daher nur eine grobe Unterteilung der Tumoren liefern. Die zunehmende VerfĂŒgbarkeit von Multi-Omics-Daten fĂŒr Krebspatienten ermöglicht die Neudefinition von fundierteren Krebssubtypen, die wiederum zu spezifischeren Behandlungen fĂŒr Krebspatienten fĂŒhren könnten. In dieser Dissertation identifizieren wir neue, potentielle Krebssubtypen basierend auf Multi-Omics-Daten. HierfĂŒr verwenden wir unĂŒberwachtes Multiple Kernel Learning, welches in der Lage ist mehrere Datentypen miteinander zu kombinieren. Drei Herausforderungen des unĂŒberwachten Multiple Kernel Learnings werden adressiert: Robustheit, Anwendbarkeit und Interpretierbarkeit. ZunĂ€chst zeigen wir, dass die zusĂ€tzliche Regularisierung des Multiple Kernel Learning Frameworks zur Implementierung verschiedener Dimensionsreduktionstechniken die StabilitĂ€t der identifizierten Patientengruppen erhöht. Diese Robustheit ist besonders vorteilhaft fĂŒr DatensĂ€tze mit einer geringen Anzahl von Proben. Zweitens passen wir die Zielfunktion der kernbasierten Hauptkomponentenanalyse an, um eine integrative Version dieser weit verbreiteten Dimensionsreduktionstechnik zu ermöglichen. Drittens verbessern wir die Interpretierbarkeit von kernbasierten Lernprozeduren, indem wir verwendete Merkmale in homogene Gruppen unterteilen bevor wir die Daten integrieren. Mit Hilfe dieser Gruppen definieren wir eine Bewertungsfunktion, die die weitere Auswertung der biologischen Eigenschaften von Patientengruppen erleichtert. Alle drei Verfahren werden an realen Krebsdaten getestet. Den Vergleich unserer Methodik mit etablierten Methoden weist nach, dass unsere Arbeit neue und nĂŒtzliche Möglichkeiten bietet, um integrative Patientengruppen zu identifizieren und Einblicke in medizinisch relevante Eigenschaften von Krebssubtypen zu erhalten
Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.
Ph.D. Thesis. University of HawaiÊ»i at MÄnoa 2017
INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE
Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify âat riskâ individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics.
1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research.
2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS).
3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes.
Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine
Integrative methods for analyzing big data in precision medicine
We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of âBig Dataâ in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face
Recommended from our members
Chromosomal instability in untreated primary prostate cancer as an indicator of metastatic potential.
BackgroundMetastatic prostate cancer (PC) is highly lethal. The ability to identify primary tumors capable of dissemination is an unmet need in the quest to understand lethal biology and improve patient outcomes. Previous studies have linked chromosomal instability (CIN), which generates aneuploidy following chromosomal missegregation during mitosis, to PC progression. Evidence of CIN includes broad copy number alterations (CNAs) spanning >â300 base pairs of DNA, which may also be measured via RNA expression signatures associated with CNA frequency. Signatures of CIN in metastatic PC, however, have not been interrogated or well defined. We examined a published 70-gene CIN signature (CIN70) in untreated and castration-resistant prostate cancer (CRPC) cohorts from The Cancer Genome Atlas (TCGA) and previously published reports. We also performed transcriptome and CNA analysis in a unique cohort of untreated primary tumors collected from diagnostic prostate needle biopsies (PNBX) of localized (M0) and metastatic (M1) cases to determine if CIN was linked to clinical stage and outcome.MethodsPNBX were collected from 99 patients treated in the VA Greater Los Angeles (GLA-VA) Healthcare System between 2000 and 2016. Total RNA was extracted from high-grade cancer areas in PNBX cores, followed by RNA sequencing and/or copy number analysis using OncoScan. Multivariate logistic regression analyses permitted calculation of odds ratios for CIN status (high versus low) in an expanded GLA-VA PNBX cohort (nâ=â121).ResultsThe CIN70 signature was significantly enriched in primary tumors and CRPC metastases from M1 PC cases. An intersection of gene signatures comprised of differentially expressed genes (DEGs) generated through comparison of M1 versus M0 PNBX and primary CRPC tumors versus metastases revealed a 157-gene "metastasis" signature that was further distilled to 7-genes (PC-CIN) regulating centrosomes, chromosomal segregation, and mitotic spindle assembly. High PC-CIN scores correlated with CRPC, PC-death and all-cause mortality in the expanded GLA-VA PNBX cohort. Interestingly, approximately 1/3 of M1 PNBX cases exhibited low CIN, illuminating differential pathways of lethal PC progression.ConclusionsMeasuring CIN in PNBX by transcriptome profiling is feasible, and the PC-CIN signature may identify patients with a high risk of lethal progression at the time of diagnosis
- âŠ