19,047 research outputs found

    Combining multiple positive training sets to generate confidence scores for protein–protein interactions

    Get PDF
    Motivation: High-throughput experimental and computational methods are generating a wealth of protein–protein interaction data for a variety of organisms. However, data produced by current state-of-the-art methods include many false positives, which can hinder the analyses needed to derive biological insights. One way to address this problem is to assign confidence scores that reflect the reliability and biological significance of each interaction. Most previously described scoring methods use a set of likely true positives to train a model to score all interactions in a dataset. A single positive training set, however, may be biased and not representative of true interaction space

    PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network

    Get PDF
    A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction

    Automated data integration for developmental biological research

    Get PDF
    In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

    DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Charting the interactions among genes and among their protein products is essential for understanding biological systems. A flood of interaction data is emerging from high throughput technologies, computational approaches, and literature mining methods. Quick and efficient access to this data has become a critical issue for biologists. Several excellent multi-organism databases for gene and protein interactions are available, yet most of these have understandable difficulty maintaining comprehensive information for any one organism. No single database, for example, includes all available interactions, integrated gene expression data, and comprehensive and searchable gene information for the important model organism, <it>Drosophila melanogaster</it>.</p> <p>Description</p> <p>DroID, the <it>Drosophila </it>Interactions Database, is a comprehensive interactions database designed specifically for <it>Drosophila</it>. DroID houses published physical protein interactions, genetic interactions, and computationally predicted interactions, including interologs based on data for other model organisms and humans. All interactions are annotated with original experimental data and source information. DroID can be searched and filtered based on interaction information or a comprehensive set of gene attributes from Flybase. DroID also contains gene expression and expression correlation data that can be searched and used to filter datasets, for example, to focus a study on sub-networks of co-expressed genes. To address the inherent noise in interaction data, DroID employs an updatable confidence scoring system that assigns a score to each physical interaction based on the likelihood that it represents a biologically significant link.</p> <p>Conclusion</p> <p>DroID is the most comprehensive interactions database available for <it>Drosophila</it>. To facilitate downstream analyses, interactions are annotated with original experimental information, gene expression data, and confidence scores. All data in DroID are freely available and can be searched, explored, and downloaded through three different interfaces, including a text based web site, a Java applet with dynamic graphing capabilities (IM Browser), and a Cytoscape plug-in. DroID is available at <url>http://www.droidb.org</url>.</p

    Classifying pairs with trees for supervised biological network inference

    Full text link
    Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the later for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.Comment: 22 page
    corecore