12,993 research outputs found

    Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

    Get PDF
    One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins

    An integrated approach to the prediction of domain-domain interactions

    Get PDF
    BACKGROUND: The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain interactions, it is crucial to understand protein interactions at the level of the domains. The availability of many diverse biological data sets provides an opportunity to discover the underlying domain interactions within protein interactions through an integration of these biological data sets. RESULTS: We combine protein interaction data sets from multiple species, molecular sequences, and gene ontology to construct a set of high-confidence domain-domain interactions. First, we propose a new measure, the expected number of interactions for each pair of domains, to score domain interactions based on protein interaction data in one species and show that it has similar performance as the E-value defined by Riley et al. [1]. Our new measure is applied to the protein interaction data sets from yeast, worm, fruitfly and humans. Second, information on pairs of domains that coexist in known proteins and on pairs of domains with the same gene ontology function annotations are incorporated to construct a high-confidence set of domain-domain interactions using a Bayesian approach. Finally, we evaluate the set of domain-domain interactions by comparing predicted domain interactions with those defined in iPfam database [2,3] that were derived based on protein structures. The accuracy of predicted domain interactions are also confirmed by comparing with experimentally obtained domain interactions from H. pylori [4]. As a result, a total of 2,391 high-confidence domain interactions are obtained and these domain interactions are used to unravel detailed protein and domain interactions in several protein complexes. CONCLUSION: Our study shows that integration of multiple biological data sets based on the Bayesian approach provides a reliable framework to predict domain interactions. By integrating multiple data sources, the coverage and accuracy of predicted domain interactions can be significantly increased

    Similarity-based virtual screening using 2D fingerprints

    Get PDF
    This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    HVint: a strategy for identifying novel protein-protein interactions in Herpes Simplex Virus Type 1

    Get PDF
    Human herpesviruses are widespread human pathogens with a remarkable impact on worldwide public health. Despite intense decades of research, the molecular details in many aspects of their function remain to be fully characterized. To unravel the details of how these viruses operate, a thorough understanding of the relationships between the involved components is key. Here, we present HVint, a novel protein-protein intra-viral interaction resource for herpes simplex virus type 1 (HSV-1) integrating data from five external sources. To assess each interaction, we used a scoring scheme that takes into consideration aspects such as the type of detection method and the number of lines of evidence. The coverage of the initial interactome was further increased using evolutionary information, by importing interactions reported for other human herpesviruses. These latter interactions constitute, therefore, computational predictions for potential novel interactions in HSV-1. An independent experimental analysis was performed to confirm a subset of our predicted interactions. This subset covers proteins that contribute to nuclear egress and primary envelopment events, including VP26, pUL31, pUL40 and the recently characterized pUL32 and pUL21. Our findings support a coordinated crosstalk between VP26 and proteins such as pUL31, pUS9 and the CSVC complex, contributing to the development of a model describing the nuclear egress and primary envelopment pathways of newly synthesized HSV-1 capsids. The results are also consistent with recent findings on the involvement of pUL32 in capsid maturation and early tegumentation events. Further, they open the door to new hypotheses on virus-specific regulators of pUS9-dependent transport. To make this repository of interactions readily accessible for the scientific community, we also developed a user-friendly and interactive web interface. Our approach demonstrates the power of computational predictions to assist in the design of targeted experiments for the discovery of novel protein-protein interactions

    Data integration for the analysis of uncharacterized proteins in Mycobacterium tuberculosis

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 126-150).Mycobacterium tuberculosis is a bacterial pathogen that causes tuberculosis, a leading cause of human death worldwide from infectious diseases, especially in Africa. Despite enormous advances achieved in recent years in controlling the disease, tuberculosis remains a public health challenge. The contribution of existing drugs is of immense value, but the deadly synergy of the disease with Human Immunodeficiency Virus (HIV) or Acquired Immunodeficiency Syndrome (AIDS) and the emergence of drug resistant strains are threatening to compromise gains in tuberculosis control. In fact, the development of active tuberculosis is the outcome of the delicate balance between bacterial virulence and host resistance, which constitute two distinct and independent components. Significant progress has been made in understanding the evolution of the bacterial pathogen and its interaction with the host. The end point of these efforts is the identification of virulence factors and drug targets within the bacterium in order to develop new drugs and vaccines for the eradication of the disease

    Integrative modelling of cellular assemblies

    Get PDF
    A wide variety of experimental techniques can be used for understanding the precise molecular mechanisms underlying the activities of cellular assemblies. The inherent limitations of a single experimental technique often requires integration of data from complementary approaches to gain sufficient insights into the assembly structure and function. Here, we review popular computational approaches for integrative modelling of cellular assemblies, including protein complexes and genomic assemblies. We provide recent examples of integrative models generated for such assemblies by different experimental techniques, especially including data from 3D electron microscopy (3D-EM) and chromosome conformation capture experiments, respectively. We highlight general concepts in integrative modelling and discuss the need for careful formulation and merging of different types of information
    corecore