4,783 research outputs found

    Multi-label multi-instance transfer learning for simultaneous reconstruction and cross-talk modeling of multiple human signaling pathways

    Get PDF
    Text file contains the predicted cross-talk signaling components between human signaling pathways (homolog instance). (ZIP 36 KB

    MorphDB : prioritizing genes for specialized metabolism pathways and gene ontology categories in plants

    Get PDF
    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest

    Computational translation of genomic responses from experimental model systems to humans

    Get PDF
    The high failure rate of therapeutics showing promise in mouse models to translate to patients is a pressing challenge in biomedical science. Though retrospective studies have examined the fidelity of mouse models to their respective human conditions, approaches for prospective translation of insights from mouse models to patients remain relatively unexplored. Here, we develop a semi-supervised learning approach for inference of disease-associated human differentially expressed genes and pathways from mouse model experiments. We examined 36 transcriptomic case studies where comparable phenotypes were available for mouse and human inflammatory diseases and assessed multiple computational approaches for inferring human biology from mouse datasets. We found that semi-supervised training of a neural network identified significantly more true human biological associations than interpreting mouse experiments directly. Evaluating the experimental design of mouse experiments where our model was most successful revealed principles of experimental design that may improve translational performance. Our study shows that when prospectively evaluating biological associations in mouse studies, semi-supervised learning approaches, combining mouse and human data for biological inference, provide the most accurate assessment of human in vivo disease processes. Finally, we proffer a delineation of four categories of model system-to-human "Translation Problems" defined by the resolution and coverage of the datasets available for molecular insight translation and suggest that the task of translating insights from model systems to human disease contexts may be better accomplished by a combination of translation-minded experimental design and computational approaches.Boehringer Ingelheim PharmaceuticalsInstitute for Collaborative Biotechnologies (Grant W911NF-09-0001

    A multi-species functional embedding integrating sequence and network structure

    Full text link
    A key challenge to transferring knowledge between species is that different species have fundamentally different genetic architectures. Initial computational approaches to transfer knowledge across species have relied on measures of heredity such as genetic homology, but these approaches suffer from limitations. First, only a small subset of genes have homologs, limiting the amount of knowledge that can be transferred, and second, genes change or repurpose functions, complicating the transfer of knowledge. Many approaches address this problem by expanding the notion of homology by leveraging high-throughput genomic and proteomic measurements, such as through network alignment. In this work, we take a new approach to transferring knowledge across species by expanding the notion of homology through explicit measures of functional similarity between proteins in different species. Specifically, our kernel-based method, HANDL (Homology Assessment across Networks using Diffusion and Landmarks), integrates sequence and network structure to create a functional embedding in which proteins from different species are embedded in the same vector space. We show that inner products in this space and the vectors themselves capture functional similarity across species, and are useful for a variety of functional tasks. We perform the first whole-genome method for predicting phenologs, generating many that were previously identified, but also predicting new phenologs supported from the biological literature. We also demonstrate the HANDL embedding captures pairwise gene function, in that gene pairs with synthetic lethal interactions are significantly separated in HANDL space, and the direction of separation is conserved across species. Software for the HANDL algorithm is available at http://bit.ly/lrgr-handl.Published versio

    Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies

    Full text link
    Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, {\em i.e.}~to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of {\em ad hoc} introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR

    Large–scale data–driven network analysis of human–plasmodium falciparum interactome: extracting essential targets and processes for malaria drug discovery

    Get PDF
    Background: Plasmodium falciparum malaria is an infectious disease considered to have great impact on public health due to its associated high mortality rates especially in sub Saharan Africa. Falciparum drugresistant strains, notably, to chloroquine and sulfadoxine-pyrimethamine in Africa is traced mainly to Southeast Asia where artemisinin resistance rate is increasing. Although careful surveillance to monitor the emergence and spread of artemisinin-resistant parasite strains in Africa is on-going, research into new drugs, particularly, for African populations, is critical since there is no replaceable drug for artemisinin combination therapies (ACTs) yet. Objective: The overall objective of this study is to identify potential protein targets through host–pathogen protein–protein functional interaction network analysis to understand the underlying mechanisms of drug failure and identify those essential targets that can play their role in predicting potential drug candidates specific to the African populations through a protein-based approach of both host and Plasmodium falciparum genomic analysis. Methods: We leveraged malaria-specific genome wide association study summary statistics data obtained from Gambia, Kenya and Malawi populations, Plasmodium falciparum selective pressure variants and functional datasets (protein sequences, interologs, host-pathogen intra-organism and host-pathogen inter-organism protein-protein interactions (PPIs)) from various sources (STRING, Reactome, HPID, Uniprot, IntAct and literature) to construct overlapping functional network for both host and pathogen. Developed algorithms and a large-scale data-driven computational framework were used in this study to analyze the datasets and the constructed networks to identify densely connected subnetworks or hubs essential for network stability and integrity. The host-pathogen network was analyzed to elucidate the influence of parasite candidate key proteins within the network and predict possible resistant pathways due to host-pathogen candidate key protein interactions. We performed biological and pathway enrichment analysis on critical proteins identified to elucidate their functions. In order to leverage disease-target-drug relationships to identify potential repurposable already approved drug candidates that could be used to treat malaria, pharmaceutical datasets from drug bank were explored using semantic similarity approach based of target–associated biological processes Results: About 600,000 significant SNPs (p-value< 0.05) from the summary statistics data were mapped to their associated genes, and we identified 79 human-associated malaria genes. The assembled parasite network comprised of 8 clusters containing 799 functional interactions between 155 reviewed proteins of which 5 clusters contained 43 key proteins (selective variants) and 2 clusters contained 2 candidate key proteins(key proteins characterized by high centrality measure), C6KTB7 and C6KTD2. The human network comprised of 32 clusters containing 4,133,136 interactions between 20,329 unique reviewed proteins of which 7 clusters contained 760 key proteins and 2 clusters contained 6 significant human malaria-associated candidate key proteins or genes P22301 (IL10), P05362 (ICAM1), P01375 (TNF), P30480 (HLA-B), P16284 (PECAM1), O00206 (TLR4). The generated host-pathogen network comprised of 31,512 functional interactions between 8,023 host and pathogen proteins. We also explored the association of pfk13 gene within the host-pathogen. We observed that pfk13 cluster with host kelch–like proteins and other regulatory genes but no direct association with our identified host candidate key malaria targets. We implemented semantic similarity based approach complemented by Kappa and Jaccard statistical measure to identify 115 malaria–similar diseases and 26 potential repurposable drug hits that can be 3 appropriated experimentally for malaria treatment. Conclusion: In this study, we reviewed existing antimalarial drugs and resistance–associated variants contributing to the diminished sensitivity of antimalarials, especially chloroquine, sulfadoxine-pyrimethamine and artemisinin combination therapy within the African population. We also described various computational techniques implemented in predicting drug targets and leads in drug research. In our data analysis, we showed that possible mechanisms of resistance to artemisinin in Africa may arise from the combinatorial effects of many resistant genes to chloroquine and sulfadoxine–pyrimethamine. We investigated the role of pfk13 within the host–pathogen network. We predicted key targets that have been proposed to be essential for malaria drug and vaccine development through structural and functional analysis of host and pathogen function networks. Based on our analysis, we propose these targets as essential co-targets for combinatorial malaria drug discovery

    Structure-based Prediction of Protein-protein Interaction Networks across Proteomes

    Get PDF
    Protein-protein interactions (PPIs) orchestrate virtually all cellular processes, therefore, their exhaustive exploration is essential for the comprehensive understanding of cellular networks. Significant efforts have been devoted to expand the coverage of the proteome-wide interaction space at molecular level. A number of experimental techniques have been developed to discover PPIs, however these approaches have some limitations such as the high costs and long times of experiments, noisy data sets, and often high false positive rate and inter-study discrepancies. Given experimental limitations, computational methods are increasingly becoming important for detection and structural characterization of PPIs. In that regard, we have developed a novel pipeline for high-throughput PPI prediction based on all-to-all rigid body docking of protein structures. We focus on two questions, ‘how do proteins interact?’ and ‘which proteins interact?’. The method combines molecular modeling, structural bioinformatics, machine learning, and functional annotation data to answer these questions and it can be used for genome-wide molecular reconstruction of protein-protein interaction networks. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Further, we validated our method against a few human pathways. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques

    Analyzing Effects of Naturally Occurring Missense Mutations

    Get PDF
    Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well

    Computational Prediction of Host-Parasite Protein Interactions between P. falciparum and H. sapiens

    Get PDF
    To obtain candidates of interactions between proteins of the malaria parasite Plasmodium falciparum and the human host, homologous and conserved interactions were inferred from various sources of interaction data. Such candidate interactions were assessed by applying a machine learning approach and further filtered according to expression and molecular characteristics, enabling involved proteins to indeed interact. The analysis of predicted interactions indicated that parasite proteins predominantly target central proteins to take control of a human host cell. Furthermore, parasite proteins utilized their protein repertoire in a combinatorial manner, providing a broad connection to host cellular processes. In particular, several prominent pathways of signaling and regulation proteins were predicted to interact with parasite chaperones. Such a result suggests an important role of remodeling proteins in the interaction interface between the human host and the parasite. Identification of such molecular strategies that allow the parasite to take control of the host has the potential to deepen our understanding of the parasite specific remodeling processes of the host cell and illuminate new avenues of disease intervention
    corecore