54,872 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Increased entropy of signal transduction in the cancer metastasis phenotype
Studies into the statistical properties of biological networks have led to
important biological insights, such as the presence of hubs and hierarchical
modularity. There is also a growing interest in studying the statistical
properties of networks in the context of cancer genomics. However, relatively
little is known as to what network features differ between the cancer and
normal cell physiologies, or between different cancer cell phenotypes. Based on
the observation that frequent genomic alterations underlie a more aggressive
cancer phenotype, we asked if such an effect could be detectable as an increase
in the randomness of local gene expression patterns. Using a breast cancer gene
expression data set and a model network of protein interactions we derive
constrained weighted networks defined by a stochastic information flux matrix
reflecting expression correlations between interacting proteins. Based on this
stochastic matrix we propose and compute an entropy measure that quantifies the
degree of randomness in the local pattern of information flux around single
genes. By comparing the local entropies in the non-metastatic versus metastatic
breast cancer networks, we here show that breast cancers that metastasize are
characterised by a small yet significant increase in the degree of randomness
of local expression patterns. We validate this result in three additional
breast cancer expression data sets and demonstrate that local entropy better
characterises the metastatic phenotype than other non-entropy based measures.
We show that increases in entropy can be used to identify genes and signalling
pathways implicated in breast cancer metastasis. Further exploration of such
integrated cancer expression and protein interaction networks will therefore be
a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table
Typing tumors using pathways selected by somatic evolution.
Many recent efforts to analyze cancer genomes involve aggregation of mutations within reference maps of molecular pathways and protein networks. Here, we find these pathway studies are impeded by molecular interactions that are functionally irrelevant to cancer or the patient's tumor type, as these interactions diminish the contrast of driver pathways relative to individual frequently mutated genes. This problem can be addressed by creating stringent tumor-specific networks of biophysical protein interactions, identified by signatures of epistatic selection during tumor evolution. Using such an evolutionarily selected pathway (ESP) map, we analyze the major cancer genome atlases to derive a hierarchical classification of tumor subtypes linked to characteristic mutated pathways. These pathways are clinically prognostic and predictive, including the TP53-AXIN-ARHGEF17 combination in liver and CYLC2-STK11-STK11IP in lung cancer, which we validate in independent cohorts. This ESP framework substantially improves the definition of cancer pathways and subtypes from tumor genome data
- âŚ