14,774 research outputs found

    A multi-species functional embedding integrating sequence and network structure

    Full text link
    A key challenge to transferring knowledge between species is that different species have fundamentally different genetic architectures. Initial computational approaches to transfer knowledge across species have relied on measures of heredity such as genetic homology, but these approaches suffer from limitations. First, only a small subset of genes have homologs, limiting the amount of knowledge that can be transferred, and second, genes change or repurpose functions, complicating the transfer of knowledge. Many approaches address this problem by expanding the notion of homology by leveraging high-throughput genomic and proteomic measurements, such as through network alignment. In this work, we take a new approach to transferring knowledge across species by expanding the notion of homology through explicit measures of functional similarity between proteins in different species. Specifically, our kernel-based method, HANDL (Homology Assessment across Networks using Diffusion and Landmarks), integrates sequence and network structure to create a functional embedding in which proteins from different species are embedded in the same vector space. We show that inner products in this space and the vectors themselves capture functional similarity across species, and are useful for a variety of functional tasks. We perform the first whole-genome method for predicting phenologs, generating many that were previously identified, but also predicting new phenologs supported from the biological literature. We also demonstrate the HANDL embedding captures pairwise gene function, in that gene pairs with synthetic lethal interactions are significantly separated in HANDL space, and the direction of separation is conserved across species. Software for the HANDL algorithm is available at http://bit.ly/lrgr-handl.Published versio

    Visualization of metabolic interaction networks in microbial communities using VisANT 5.0

    Get PDF
    The complexity of metabolic networks in microbial communities poses an unresolved visualization and interpretation challenge. We address this challenge in the newly expanded version of a software tool for the analysis of biological networks, VisANT 5.0. We focus in particular on facilitating the visual exploration of metabolic interaction between microbes in a community, e.g. as predicted by COMETS (Computation of Microbial Ecosystems in Time and Space), a dynamic stoichiometric modeling framework. Using VisANT's unique metagraph implementation, we show how one can use VisANT 5.0 to explore different time-dependent ecosystem-level metabolic networks. In particular, we analyze the metabolic interaction network between two bacteria previously shown to display an obligate cross-feeding interdependency. In addition, we illustrate how a putative minimal gut microbiome community could be represented in our framework, making it possible to highlight interactions across multiple coexisting species. We envisage that the "symbiotic layout" of VisANT can be employed as a general tool for the analysis of metabolism in complex microbial communities as well as heterogeneous human tissues.This work was supported by the National Institutes of Health, R01GM103502-05 to CD, ZH and DS. Partial support was also provided by grants from the Office of Science (BER), U.S. Department of Energy (DE-SC0004962), the Joslin Diabetes Center (Pilot & Feasibility grant P30 DK036836), the Army Research Office under MURI award W911NF-12-1-0390, National Institutes of Health (1RC2GM092602-01, R01GM089978 and 5R01DE024468), NSF (1457695), and Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS), Purchase Request No. HR0011515303, Program Code: TRS-0 Issued by DARPA/CMO under Contract No. HR0011-15-C-0091. Funding for open access charge: National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. (R01GM103502-05 - National Institutes of Health; 1RC2GM092602-01 - National Institutes of Health; R01GM089978 - National Institutes of Health; 5R01DE024468 - National Institutes of Health; DE-SC0004962 - Office of Science (BER), U.S. Department of Energy; P30 DK036836 - Joslin Diabetes Center; W911NF-12-1-0390 - Army Research Office under MURI; 1457695 - NSF; HR0011515303 - Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS); HR0011-15-C-0091 - DARPA/CMO; National Institutes of Health)Published versio

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Nine Quick Tips for Analyzing Network Data

    Get PDF
    These tips provide a quick and concentrated guide for beginners in the analysis of network data
    corecore