230 research outputs found

    A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

    Get PDF
    BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of "ATP-binding", where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

    PCalign: a method to quantify physicochemical similarity of protein-protein interfaces

    Get PDF
    Abstract Background Structural comparison of protein-protein interfaces provides valuable insights into the functional relationship between proteins, which may not solely arise from shared evolutionary origin. A few methods that exist for such comparative studies have focused on structural models determined at atomic resolution, and may miss out interesting patterns present in large macromolecular complexes that are typically solved by low-resolution techniques. Results We developed a coarse-grained method, PCalign, to quantitatively evaluate physicochemical similarities between a given pair of protein-protein interfaces. This method uses an order-independent algorithm, geometric hashing, to superimpose the backbone atoms of a given pair of interfaces, and provides a normalized scoring function, PC-score, to account for the extent of overlap in terms of both geometric and chemical characteristics. We demonstrate that PCalign outperforms existing methods, and additionally facilitates comparative studies across models of different resolutions, which are not accommodated by existing methods. Furthermore, we illustrate potential application of our method to recognize interesting biological relationships masked by apparent lack of structural similarity. Conclusions PCalign is a useful method in recognizing shared chemical and spatial patterns among protein-protein interfaces. It outperforms existing methods for high-quality data, and additionally facilitates comparison across structural models with different levels of details with proven robustness against noise.http://deepblue.lib.umich.edu/bitstream/2027.42/110905/1/12859_2015_Article_471.pd

    PCalign: a method to quantify physicochemical similarity of protein-protein interfaces

    Full text link
    Abstract Background Structural comparison of protein-protein interfaces provides valuable insights into the functional relationship between proteins, which may not solely arise from shared evolutionary origin. A few methods that exist for such comparative studies have focused on structural models determined at atomic resolution, and may miss out interesting patterns present in large macromolecular complexes that are typically solved by low-resolution techniques. Results We developed a coarse-grained method, PCalign, to quantitatively evaluate physicochemical similarities between a given pair of protein-protein interfaces. This method uses an order-independent algorithm, geometric hashing, to superimpose the backbone atoms of a given pair of interfaces, and provides a normalized scoring function, PC-score, to account for the extent of overlap in terms of both geometric and chemical characteristics. We demonstrate that PCalign outperforms existing methods, and additionally facilitates comparative studies across models of different resolutions, which are not accommodated by existing methods. Furthermore, we illustrate potential application of our method to recognize interesting biological relationships masked by apparent lack of structural similarity. Conclusions PCalign is a useful method in recognizing shared chemical and spatial patterns among protein-protein interfaces. It outperforms existing methods for high-quality data, and additionally facilitates comparison across structural models with different levels of details with proven robustness against noise.http://deepblue.lib.umich.edu/bitstream/2027.42/134734/1/12859_2015_Article_471.pd

    Accurate Prediction of Peptide Binding Sites on Protein Surfaces

    Get PDF
    Many important protein–protein interactions are mediated by the binding of a short peptide stretch in one protein to a large globular segment in another. Recent efforts have provided hundreds of examples of new peptides binding to proteins for which a three-dimensional structure is available (either known experimentally or readily modeled) but where no structure of the protein–peptide complex is known. To address this gap, we present an approach that can accurately predict peptide binding sites on protein surfaces. For peptides known to bind a particular protein, the method predicts binding sites with great accuracy, and the specificity of the approach means that it can also be used to predict whether or not a putative or predicted peptide partner will bind. We used known protein–peptide complexes to derive preferences, in the form of spatial position specific scoring matrices, which describe the binding-site environment in globular proteins for each type of amino acid in bound peptides. We then scan the surface of a putative binding protein for sites for each of the amino acids present in a peptide partner and search for combinations of high-scoring amino acid sites that satisfy constraints deduced from the peptide sequence. The method performed well in a benchmark and largely agreed with experimental data mapping binding sites for several recently discovered interactions mediated by peptides, including RG-rich proteins with SMN domains, Epstein-Barr virus LMP1 with TRADD domains, DBC1 with Sir2, and the Ago hook with Argonaute PIWI domain. The method, and associated statistics, is an excellent tool for predicting and studying binding sites for newly discovered peptides mediating critical events in biology

    Accurate Protein Structure Annotation through Competitive Diffusion of Enzymatic Functions over a Network of Local Evolutionary Similarities

    Get PDF
    High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks

    Functional Sites in Structure and Sequence. Protein Active Sites and miRNA Target Recognition -

    Get PDF
    The number of protein three-dimensional structures is increasing steeply, and structural genomics projects aim to solve the structures for all proteins as a means to understanding function. In the first part of my thesis, I developed a method for the comparison of local structural patterns (e.g. enzyme active sites) that provides a reliable statistical measure to discern meaningful matches from noise. The method is complementary to structural alignment as it is able to confirm functional similarities suggested by an overall similar structure but also detects functional similarities between different folds. An easy-to-use interface is available on the Internet for functional annotation of protein structures (http://pints.embl.de). In the second part of my thesis, I present a computational screen for microRNA (miRNA) targets in Drosophila. miRNAs are short RNAs that inhibit translation of target messenger RNAs in animals by binding to complementary sites in their 3� untranslated regions. Target predictions were urgently needed as targets were known for only three of the more than 700 miRNAs. Of my predictions, six were validated experimentally and others are likely to be functional, making the results a useful resource for miRNA research. The screen extended miRNA function to pathway control, nervous system development and regulation of metabolism, and revealed that one miRNA typically regulates several targets but also that one gene is likely to be targeted by several miRNAs

    Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase

    Get PDF
    Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro

    Motif Discovery in Protein Sequences

    Get PDF
    Biology has become a data‐intensive research field. Coping with the flood of data from the new genome sequencing technologies is a major area of research. The exponential increase in the size of the datasets produced by “next‐generation sequencing” (NGS) poses unique computational challenges. In this context, motif discovery tools are widely used to identify important patterns in the sequences produced. Biological sequence motifs are defined as short, usually fixed length, sequence patterns that may represent important structural or functional features in nucleic acid and protein sequences such as transcription binding sites, splice junctions, active sites, or interaction interfaces. They can occur in an exact or approximate form within a family or a subfamily of sequences. Motif discovery is therefore an important field in bioinformatics, and numerous methods have been developed for the identification of motifs shared by a set of functionally related sequences. This chapter will review the existing motif discovery methods for protein sequences and their ability to discover biologically important features as well as their limitations for the discovery of new motifs. Finally, we will propose new horizons for motif discovery in order to address the short comings of the existent methods
    corecore