38 research outputs found
Development of computational approaches for structural classification, analysis and prediction of molecular recognition regions in proteins
The vast and growing volume of 3D protein structural data stored in the PDB contains abundant information about macromolecular complexes, and hence, data about protein interfaces. Non-covalent contacts between amino acids are the basis of protein interactions, and they are responsible for binding afinity and specificity in biological processes. In addition, water networks in protein interfaces can also complement direct interactions contributing significantly to molecular recognition, although their exact role is still not well understood.
It is estimated that protein complexes in the PDB are substantially underrepresented due to their crystallization dificulties. Methods for automatic classifification and description of the protein complexes are essential to study protein interfaces, and to propose putative binding regions. Due to this strong need, several protein-protein interaction databases have been developed. However, most of them do not take into account either protein-peptide complexes, solvent information or a proper classification of the binding regions, which are fundamental components to provide an accurate description of protein interfaces.
In the firest stage of my thesis, I developed the SCOWLP platform, a database and web application that structurally classifies protein binding regions at family level and defines accurately protein interfaces at atomic detail. The analysis of the results showed that protein-peptide complexes are substantially represented in the PDB, and are the only source of interacting information for several families. By clustering the family binding regions, I could identify 9,334 binding regions and 79,803 protein interfaces in the PDB. Interestingly, I observed that 65% of protein families interact to other molecules through more than one region and in 22% of the cases the same region recognizes different protein families. The database and web application are open to the research community (www.scowlp.org) and can tremendously facilitate high-throughput comparative analysis of protein binding regions, as well as, individual analysis of protein interfaces.
SCOWLP and the other databases collect and classify the protein binding regions at family level, where sequence and structure homology exist. Interestingly, it has been observed that many protein families also present structural resemblances within each other, mostly across folds. Likewise, structurally similar interacting motifs (binding regions) have been identified among proteins with different folds and functions. For these reasons, I decided to explore the possibility to infer protein binding regions independently of their fold classification. Thus, I performed the firest systematic analysis of binding region conservation within all protein families that are structurally similar, calculated using non-sequential structural alignment methods. My results indicate there is a substantial molecular recognition information that could be potentially inferred among proteins beyond family level. I obtained a 6 to 8 fold enrichment of binding regions, and identified putative binding regions for 728 protein families that lack binding information. Within the results, I found out protein complexes from different folds that present similar interfaces, confirming the predictive usage of the methodology. The data obtained with my approach may complement the SCOWLP family binding regions suggesting alternative binding regions, and can be used to assist protein-protein docking experiments and facilitate rational ligand design.
In the last part of my thesis, I used the interacting information contained in the SCOWLP database to help understand the role that water plays in protein interactions in terms of affinity and specificity. I carried out one of the firest high-throughput analysis of solvent in protein interfaces for a curated dataset of transient and obligate protein complexes. Surprisingly, the results highlight the abundance of water-bridged residues in protein interfaces (40.1% of the interfacial residues) that reinforces the importance of including solvent in protein interaction studies (14.5% extra residues interacting only water- mediated). Interestingly, I also observed that obligate and transient interfaces present a comparable amount of solvent, which contrasts the old thoughts saying that obligate protein complexes are expected to exhibit similarities to protein cores having a dry and hydrophobic interfaces. I characterized novel features of water-bridged residues in terms of secondary structure, temperature factors, residue composition, and pairing preferences that differed from direct residue-residue interactions. The results also showed relevant aspects in the mobility and energetics of water-bridged interfacial residues.
Collectively, my doctoral thesis work can be summarized in the following points:
1. I developed SCOWLP, an improved framework that identiffies protein interfaces and classifies protein binding regions at family level.
2. I developed a novel methodology to predict alternative binding regions among structurally similar protein families independently of the fold they belong to.
3. I performed a high-throughput analysis of water-bridged interactions contained in SCOWLP to study the role of solvent in protein interfaces. These three components of my thesis represent novel methods for exploiting existing structural information to gain insights into protein- protein interactions, key mechanisms to understand biological processes
Docking glycosaminoglycans to proteins: analysis of solvent inclusion
Glycosaminoglycans (GAGs) are anionic polysaccharides, which participate in key processes in the extracellular matrix by interactions with protein targets. Due to their charged nature, accurate consideration of electrostatic and water-mediated interactions is indispensable for understanding GAGs binding properties. However, solvent is often overlooked in molecular recognition studies. Here we analyze the abundance of solvent in GAG-protein interfaces and investigate the challenges of adding explicit solvent in GAG-protein docking experiments. We observe PDB GAG-protein interfaces being significantly more hydrated than protein–protein interfaces. Furthermore, by applying molecular dynamics approaches we estimate that about half of GAG-protein interactions are water-mediated. With a dataset of eleven GAG-protein complexes we analyze how solvent inclusion affects Autodock 3, eHiTs, MOE and FlexX docking. We develop an approach to de novo place explicit solvent into the binding site prior to docking, which uses the GRID program to predict positions of waters and to locate possible areas of solvent displacement upon ligand binding. To investigate how solvent placement affects docking performance, we compare these results with those obtained by taking into account information about the solvent position in the crystal structure. In general, we observe that inclusion of solvent improves the results obtained with these methods. Our data show that Autodock 3 performs best, though it experiences difficulties to quantitatively reproduce experimental data on specificity of heparin/heparan sulfate disaccharides binding to IL-8. Our work highlights the current challenges of introducing solvent in protein-GAGs recognition studies, which is crucial for exploiting the full potential of these molecules for rational engineering
Analysis of the impact of solvent on contacts prediction in proteins
<p>Abstract</p> <p>Background</p> <p>The correlated mutations concept is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Approaches based on this concept have been widely used for protein contacts prediction since the 90s. Previously, we have shown that water-mediated interactions play an important role in protein interfaces. We have observed that current "dry" correlated mutations approaches might not properly predict certain interactions in protein interfaces due to the fact that they are water-mediated.</p> <p>Results</p> <p>The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose we use linear combinations of the predictions obtained by the application of two different similarity matrices: a standard "dry" similarity matrix (DRY) and a "wet" similarity matrix (WET) derived from all water-mediated protein interfacial interactions in the PDB. We analyze two datasets containing 50 domains and 10 domain pairs from PFAM and compare the results obtained by using a combination of both matrices. We find that for both intra- and interdomain contacts predictions the introduction of a combination of a "wet" and a "dry" similarity matrix improves the predictions in comparison to the "dry" one alone.</p> <p>Conclusion</p> <p>Our analysis, despite the complexity of its possible general applicability, opens up that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches.</p
SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces
BACKGROUND: Currently there is a strong need for methods that help to obtain an accurate description of protein interfaces in order to be able to understand the principles that govern molecular recognition and protein function. Many of the recent efforts to computationally identify and characterize protein networks extract protein interaction information at atomic resolution from the PDB. However, they pay none or little attention to small protein ligands and solvent. They are key components and mediators of protein interactions and fundamental for a complete description of protein interfaces. Interactome profiling requires the development of computational tools to extract and analyze protein-protein, protein-ligand and detailed solvent interaction information from the PDB in an automatic and comparative fashion. Adding this information to the existing one on protein-protein interactions will allow us to better understand protein interaction networks and protein function. DESCRIPTION: SCOWLP (Structural Characterization Of Water, Ligands and Proteins) is a user-friendly and publicly accessible web-based relational database for detailed characterization and visualization of the PDB protein interfaces. The SCOWLP database includes proteins, peptidic-ligands and interface water molecules as descriptors of protein interfaces. It contains currently 74,907 protein interfaces and 2,093,976 residue-residue interactions formed by 60,664 structural units (protein domains and peptidic-ligands) and their interacting solvent. The SCOWLP web-server allows detailed structural analysis and comparisons of protein interfaces at atomic level by text query of PDB codes and/or by navigating a SCOP-based tree. It includes a visualization tool to interactively display the interfaces and label interacting residues and interface solvent by atomic physicochemical properties. SCOWLP is automatically updated with every SCOP release. CONCLUSION: SCOWLP enriches substantially the description of protein interfaces by adding detailed interface information of peptidic-ligands and solvent to the existing protein-protein interaction databases. SCOWLP may be of interest to many structural bioinformaticians. It provides a platform for automatic global mapping of protein interfaces at atomic level, representing a useful tool for classification of protein interfaces, protein binding comparative studies, reconstruction of protein complexes and understanding protein networks. The web-server with the database and its additional summary tables used for our analysis are available at
SCOWLP classification: Structural comparison and analysis of protein binding regions
<p>Abstract</p> <p>Background</p> <p>Detailed information about protein interactions is critical for our understanding of the principles governing protein recognition mechanisms. The structures of many proteins have been experimentally determined in complex with different ligands bound either in the same or different binding regions. Thus, the structural interactome requires the development of tools to classify protein binding regions. A proper classification may provide a general view of the regions that a protein uses to bind others and also facilitate a detailed comparative analysis of the interacting information for specific protein binding regions at atomic level. Such classification might be of potential use for deciphering protein interaction networks, understanding protein function, rational engineering and design.</p> <p>Description</p> <p>Protein binding regions (PBRs) might be ideally described as well-defined separated regions that share no interacting residues one another. However, PBRs are often irregular, discontinuous and can share a wide range of interacting residues among them. The criteria to define an individual binding region can be often arbitrary and may differ from other binding regions within a protein family. Therefore, the rational behind protein interface classification should aim to fulfil the requirements of the analysis to be performed.</p> <p>We extract detailed interaction information of protein domains, peptides and interfacial solvent from the SCOWLP database and we classify the PBRs of each domain family. For this purpose, we define a similarity index based on the overlapping of interacting residues mapped in pair-wise structural alignments. We perform our classification with agglomerative hierarchical clustering using the complete-linkage method. Our classification is calculated at different similarity cut-offs to allow flexibility in the analysis of PBRs, feature especially interesting for those protein families with conflictive binding regions.</p> <p>The hierarchical classification of PBRs is implemented into the SCOWLP database and extends the SCOP classification with three additional family sub-levels: Binding Region, Interface and Contacting Domains. SCOWLP contains 9,334 binding regions distributed within 2,561 families. In 65% of the cases we observe families containing more than one binding region. Besides, 22% of the regions are forming complex with more than one different protein family.</p> <p>Conclusion</p> <p>The current SCOWLP classification and its web application represent a framework for the study of protein interfaces and comparative analysis of protein family binding regions. This comparison can be performed at atomic level and allows the user to study interactome conservation and variability. The new SCOWLP classification may be of great utility for reconstruction of protein complexes, understanding protein networks and ligand design. SCOWLP will be updated with every SCOP release. The web application is available at <url>http://www.scowlp.org</url>.</p
PAT:predictor for structured units and its application for the optimization of target molecules for the generation of synthetic antibodies
BACKGROUND: The identification of structured units in a protein sequence is an important first step for most biochemical studies. Importantly for this study, the identification of stable structured region is a crucial first step to generate novel synthetic antibodies. While many approaches to find domains or predict structured regions exist, important limitations remain, such as the optimization of domain boundaries and the lack of identification of non-domain structured units. Moreover, no integrated tool exists to find and optimize structural domains within protein sequences. RESULTS: Here, we describe a new tool, PAT (http://www.kimlab.org/software/pat) that can efficiently identify both domains (with optimized boundaries) and non-domain putative structured units. PAT automatically analyzes various structural properties, evaluates the folding stability, and reports possible structural domains in a given protein sequence. For reliability evaluation of PAT, we applied PAT to identify antibody target molecules based on the notion that soluble and well-defined protein secondary and tertiary structures are appropriate target molecules for synthetic antibodies. CONCLUSION: PAT is an efficient and sensitive tool to identify structured units. A performance analysis shows that PAT can characterize structurally well-defined regions in a given sequence and outperforms other efforts to define reliable boundaries of domains. Specially, PAT successfully identifies experimentally confirmed target molecules for antibody generation. PAT also offers the pre-calculated results of 20,210 human proteins to accelerate common queries. PAT can therefore help to investigate large-scale structured domains and improve the success rate for synthetic antibody generation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1001-1) contains supplementary material, which is available to authorized users
Large-scale interaction profiling of PDZ domains through proteomic peptide-phage display using human and viral phage peptidomes
The human proteome contains a plethora of short linear motifs (SLiMs) that serve as binding interfaces for modular protein domains. Such interactions are crucial for signaling and other cellular processes, but are difficult to detect because of their low to moderate affinities. Here we developed a dedicated approach, proteomic peptide-phage display (ProP-PD), to identify domain-SLiM interactions. Specifically, we generated phage libraries containing all human and viral C-terminal peptides using custom oligonucleotide microarrays. With these libraries we screened the nine PSD-95/ Dlg/ZO-1 (PDZ) domains of human Densin-180, Erbin, Scribble, and Disks large homolog 1 for peptide ligands. We identified several known and putative interactions potentially relevant to cellular signaling pathways and confirmed interactions between fulllength Scribble and the target proteins β-PIX, plakophilin-4, and guanylate cyclase soluble subunit a-2 using colocalization and coimmunoprecipitation experiments. The affinities of recombinant Scribble PDZ domains and the synthetic peptides representing the C termini of these proteins were in the 1- to 40-μM range. Furthermore, we identified several well-established host-virus protein- protein interactions, and confirmed that PDZ domains of Scribble interact with the C terminus of Tax-1 of human T-cell leukemia virus with micromolar affinity. Previously unknown putative viral protein ligands for the PDZ domains of Scribble and Erbin were also identified. Thus, we demonstrate that our ProP-PD libraries are useful tools for probing PDZ domain interactions. The method can be extended to interrogate all potential eukaryotic, bacterial, and viral SLiMs and we suggest it will be a highly valuable approach for studying cellular and pathogen-host protein-protein interactions
A Genome-Scale DNA Repair RNAi Screen Identifies SPG48 as a Novel Gene Associated with Hereditary Spastic Paraplegia
We have identified a novel gene in a genome-wide, double-strand break DNA repair RNAi screen and show that is involved in the neurological disease hereditary spastic paraplegia