135 research outputs found

    Deciphering Network Community Structure by Surprise

    Get PDF
    The analysis of complex networks permeates all sciences, from biology to sociology. A fundamental, unsolved problem is how to characterize the community structure of a network. Here, using both standard and novel benchmarks, we show that maximization of a simple global parameter, which we call Surprise (S), leads to a very efficient characterization of the community structure of complex synthetic networks. Particularly, S qualitatively outperforms the most commonly used criterion to define communities, Newman and Girvan's modularity (Q). Applying S maximization to real networks often provides natural, well-supported partitions, but also sometimes counterintuitive solutions that expose the limitations of our previous knowledge. These results indicate that it is possible to define an effective global criterion for community structure and open new routes for the understanding of complex networks.Comment: 7 pages, 5 figure

    An ontology-based search engine for protein-protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database.</p> <p>Results</p> <p>We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions.</p> <p>Conclusion</p> <p>Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.</p

    High-throughput, quantitative analyses of genetic interactions in E. coli.

    Get PDF
    Large-scale genetic interaction studies provide the basis for defining gene function and pathway architecture. Recent advances in the ability to generate double mutants en masse in Saccharomyces cerevisiae have dramatically accelerated the acquisition of genetic interaction information and the biological inferences that follow. Here we describe a method based on F factor-driven conjugation, which allows for high-throughput generation of double mutants in Escherichia coli. This method, termed genetic interaction analysis technology for E. coli (GIANT-coli), permits us to systematically generate and array double-mutant cells on solid media in high-density arrays. We show that colony size provides a robust and quantitative output of cellular fitness and that GIANT-coli can recapitulate known synthetic interactions and identify previously unidentified negative (synthetic sickness or lethality) and positive (suppressive or epistatic) relationships. Finally, we describe a complementary strategy for genome-wide suppressor-mutant identification. Together, these methods permit rapid, large-scale genetic interaction studies in E. coli

    PPLook: an automated data mining tool for protein-protein interaction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.</p> <p>Results</p> <p>We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.</p> <p>Conclusions</p> <p>PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at <url>http://meta.usc.edu/softs/PPLook</url>.</p

    Network-Free Inference of Knockout Effects in Yeast

    Get PDF
    Perturbation experiments, in which a certain gene is knocked out and the expression levels of other genes are observed, constitute a fundamental step in uncovering the intricate wiring diagrams in the living cell and elucidating the causal roles of genes in signaling and regulation. Here we present a novel framework for analyzing large cohorts of gene knockout experiments and their genome-wide effects on expression levels. We devise clustering-like algorithms that identify groups of genes that behave similarly with respect to the knockout data, and utilize them to predict knockout effects and to annotate physical interactions between proteins as inhibiting or activating. Differing from previous approaches, our prediction approach does not depend on physical network information; the latter is used only for the annotation task. Consequently, it is both more efficient and of wider applicability than previous methods. We evaluate our approach using a large scale collection of gene knockout experiments in yeast, comparing it to the state-of-the-art SPINE algorithm. In cross validation tests, our algorithm exhibits superior prediction accuracy, while at the same time increasing the coverage by over 25-fold. Significant coverage gains are obtained also in the annotation of the physical network

    Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Experimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles.</p> <p>Results</p> <p>To evaluate the performance of our PPI text classifier, we conducted experiments based on the BioCreAtIvE-II IAS dataset. Our results show that adding likely-labeled data generally increases AUC by 3~6%, indicating better ranking ability. Our experiments also show that our newly-proposed term-weighting scheme has the highest AUC among all common weighting schemes. Our final model achieves an F-measure and AUC 2.9% and 5.0% higher than those of the top-ranking system in the IAS challenge.</p> <p>Conclusion</p> <p>Our experiments demonstrate the effectiveness of integrating unlabeled and likely labeled data to augment a PPI text classification system. Our mixed model is suitable for ranking purposes whereas our hierarchical model is better for filtering. In addition, our results indicate that supervised weighting schemes outperform unsupervised ones. Our newly-proposed weighting scheme, TFBRF, which considers documents that do not contain the target word, avoids some of the biases found in traditional weighting schemes. Our experiment results show TFBRF to be the most effective among several other top weighting schemes.</p

    FORG3D: Force-directed 3D graph editor for visualization of integrated genome scale data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomics research produces vast amounts of experimental data that needs to be integrated in order to understand, model, and interpret the underlying biological phenomena. Interpreting these large and complex data sets is challenging and different visualization methods are needed to help produce knowledge from the data.</p> <p>Results</p> <p>To help researchers to visualize and interpret integrated genomics data, we present a novel visualization method and bioinformatics software tool called FORG3D that is based on real-time three-dimensional force-directed graphs. FORG3D can be used to visualize integrated networks of genome scale data such as interactions between genes or gene products, signaling transduction, metabolic pathways, functional interactions and evolutionary relationships. Furthermore, we demonstrate its utility by exploring gene network relationships using integrated data sets from a <it>Caenorhabditis elegans </it>Parkinson's disease model.</p> <p>Conclusion</p> <p>We have created an open source software tool called FORG3D that can be used for visualizing and exploring integrated genome scale data.</p

    Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering

    Get PDF
    Background: How to extract useful information from complex biological networks is a major goal in many fields, especially in genomics and proteomics. We have shown in several works that iterative hierarchical clustering, as implemented in the UVCluster program, is a powerful tool to analyze many of those networks. However, the amount of computation time required to perform UVCluster analyses imposed significant limitations to its use. Methodology/Principal Findings: We describe the suite Jerarca, designed to efficiently convert networks of interacting units into dendrograms by means of iterative hierarchical clustering. Jerarca is divided into three main sections. First, weighted distances among units are computed using up to three different approaches: a more efficient version of UVCluster and two new, related algorithms called RCluster and SCluster. Second, Jerarca builds dendrograms based on those distances, using well-known phylogenetic algorithms, such as UPGMA or Neighbor-Joining. Finally, Jerarca provides optimal partitions of the trees using statistical criteria based on the distribution of intra- and intercluster connections. Outputs compatible with the phylogenetic software MEGA and the Cytoscape package are generated, allowing the results to be easily visualized. Conclusions/Significance: The four main advantages of Jerarca in respect to UVCluster are: 1) Improved speed of a novel UVCluster algorithm; 2) Additional, alternative strategies to perform iterative hierarchical clustering; 3) Automatic evaluatio

    Genetic Identification of a Network of Factors that Functionally Interact with the Nucleosome Remodeling ATPase ISWI

    Get PDF
    Nucleosome remodeling and covalent modifications of histones play fundamental roles in chromatin structure and function. However, much remains to be learned about how the action of ATP-dependent chromatin remodeling factors and histone-modifying enzymes is coordinated to modulate chromatin organization and transcription. The evolutionarily conserved ATP-dependent chromatin-remodeling factor ISWI plays essential roles in chromosome organization, DNA replication, and transcription regulation. To gain insight into regulation and mechanism of action of ISWI, we conducted an unbiased genetic screen to identify factors with which it interacts in vivo. We found that ISWI interacts with a network of factors that escaped detection in previous biochemical analyses, including the Sin3A gene. The Sin3A protein and the histone deacetylase Rpd3 are part of a conserved histone deacetylase complex involved in transcriptional repression. ISWI and the Sin3A/Rpd3 complex co-localize at specific chromosome domains. Loss of ISWI activity causes a reduction in the binding of the Sin3A/Rpd3 complex to chromatin. Biochemical analysis showed that the ISWI physically interacts with the histone deacetylase activity of the Sin3A/Rpd3 complex. Consistent with these findings, the acetylation of histone H4 is altered when ISWI activity is perturbed in vivo. These findings suggest that ISWI associates with the Sin3A/Rpd3 complex to support its function in vivo

    A systematic analysis of host factors reveals a Med23-interferon-λ regulatory axis against herpes simplex virus type 1 replication

    Get PDF
    Herpes simplex virus type 1 (HSV-1) is a neurotropic virus causing vesicular oral or genital skin lesions, meningitis and other diseases particularly harmful in immunocompromised individuals. To comprehensively investigate the complex interaction between HSV-1 and its host we combined two genome-scale screens for host factors (HFs) involved in virus replication. A yeast two-hybrid screen for protein interactions and a RNA interference (RNAi) screen with a druggable genome small interfering RNA (siRNA) library confirmed existing and identified novel HFs which functionally influence HSV-1 infection. Bioinformatic analyses found the 358 HFs were enriched for several pathways and multi-protein complexes. Of particular interest was the identification of Med23 as a strongly anti-viral component of the largely pro-viral Mediator complex, which links specific transcription factors to RNA polymerase II. The anti-viral effect of Med23 on HSV-1 replication was confirmed in gain-of-function gene overexpression experiments, and this inhibitory effect was specific to HSV-1, as a range of other viruses including Vaccinia virus and Semliki Forest virus were unaffected by Med23 depletion. We found Med23 significantly upregulated expression of the type III interferon family (IFN-λ) at the mRNA and protein level by directly interacting with the transcription factor IRF7. The synergistic effect of Med23 and IRF7 on IFN-λ induction suggests this is the major transcription factor for IFN-λ expression. Genotypic analysis of patients suffering recurrent orofacial HSV-1 outbreaks, previously shown to be deficient in IFN-λ secretion, found a significant correlation with a single nucleotide polymorphism in the IFN-λ3 (IL28b) promoter strongly linked to Hepatitis C disease and treatment outcome. This paper describes a link between Med23 and IFN-λ, provides evidence for the crucial role of IFN-λ in HSV-1 immune control, and highlights the power of integrative genome-scale approaches to identify HFs critical for disease progression and outcome
    • …
    corecore