Search CORE

14,774 research outputs found

A multi-species functional embedding integrating sequence and network structure

Author: Cannistra Anthony
Crovella Mark
Fan Jason
Fried Inbar
Hescott Benjamin
Leiserson Mark D. M.
Lim Tim
Schaffner Thomas
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/04/2018
Field of study

A key challenge to transferring knowledge between species is that different species have fundamentally different genetic architectures. Initial computational approaches to transfer knowledge across species have relied on measures of heredity such as genetic homology, but these approaches suffer from limitations. First, only a small subset of genes have homologs, limiting the amount of knowledge that can be transferred, and second, genes change or repurpose functions, complicating the transfer of knowledge. Many approaches address this problem by expanding the notion of homology by leveraging high-throughput genomic and proteomic measurements, such as through network alignment. In this work, we take a new approach to transferring knowledge across species by expanding the notion of homology through explicit measures of functional similarity between proteins in different species. Specifically, our kernel-based method, HANDL (Homology Assessment across Networks using Diffusion and Landmarks), integrates sequence and network structure to create a functional embedding in which proteins from different species are embedded in the same vector space. We show that inner products in this space and the vectors themselves capture functional similarity across species, and are useful for a variety of functional tasks. We perform the first whole-genome method for predicting phenologs, generating many that were previously identified, but also predicting new phenologs supported from the biological literature. We also demonstrate the HANDL embedding captures pairwise gene function, in that gene pairs with synthetic lethal interactions are significantly separated in HANDL space, and the direction of separation is conserved across species. Software for the HANDL algorithm is available at http://bit.ly/lrgr-handl.Published versio

Boston University Institutional Repository (OpenBU)

Recommended from our members

ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks.

Author: Blaby Ian K
Nguyen Nam D
Wang Daifeng
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

BACKGROUND:The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS:We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS:ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster

eScholarship - University of California

Visualization of metabolic interaction networks in microbial communities using VisANT 5.0

Author: Chang Yi-Chien
DeLisi Charles
Granger Brian R.
Hu Zhenjun
Segre Daniel
Wang Yan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2016
Field of study

The complexity of metabolic networks in microbial communities poses an unresolved visualization and interpretation challenge. We address this challenge in the newly expanded version of a software tool for the analysis of biological networks, VisANT 5.0. We focus in particular on facilitating the visual exploration of metabolic interaction between microbes in a community, e.g. as predicted by COMETS (Computation of Microbial Ecosystems in Time and Space), a dynamic stoichiometric modeling framework. Using VisANT's unique metagraph implementation, we show how one can use VisANT 5.0 to explore different time-dependent ecosystem-level metabolic networks. In particular, we analyze the metabolic interaction network between two bacteria previously shown to display an obligate cross-feeding interdependency. In addition, we illustrate how a putative minimal gut microbiome community could be represented in our framework, making it possible to highlight interactions across multiple coexisting species. We envisage that the "symbiotic layout" of VisANT can be employed as a general tool for the analysis of metabolism in complex microbial communities as well as heterogeneous human tissues.This work was supported by the National Institutes of Health, R01GM103502-05 to CD, ZH and DS. Partial support was also provided by grants from the Office of Science (BER), U.S. Department of Energy (DE-SC0004962), the Joslin Diabetes Center (Pilot & Feasibility grant P30 DK036836), the Army Research Office under MURI award W911NF-12-1-0390, National Institutes of Health (1RC2GM092602-01, R01GM089978 and 5R01DE024468), NSF (1457695), and Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS), Purchase Request No. HR0011515303, Program Code: TRS-0 Issued by DARPA/CMO under Contract No. HR0011-15-C-0091. Funding for open access charge: National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. (R01GM103502-05 - National Institutes of Health; 1RC2GM092602-01 - National Institutes of Health; R01GM089978 - National Institutes of Health; 5R01DE024468 - National Institutes of Health; DE-SC0004962 - Office of Science (BER), U.S. Department of Energy; P30 DK036836 - Joslin Diabetes Center; W911NF-12-1-0390 - Army Research Office under MURI; 1457695 - NSF; HR0011515303 - Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS); HR0011-15-C-0091 - DARPA/CMO; National Institutes of Health)Published versio

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California