218 research outputs found
Saturating representation of loop conformational fragments in structure databanks
BACKGROUND: Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks. RESULTS: We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments. CONCLUSION: The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space
PCRPi-DB:a database of computationally annotated hot spots in protein interfaces
Protein–protein interactions are central to almost any cellular process. Although typically protein interfaces are large, it is well established that only a relatively small region, the so-called ‘hot spot’, contributes the most to the total binding energy. There is a clear interest in identifying hot spots because of its application in drug discovery and protein design. Presaging Critical Residues in Protein Interfaces Database (PCRPi-DB) is a public repository that archives computationally annotated hot spots in protein complexes for which the 3D structure is known. Hot spots have been annotated using a new and highly accurate computational method developed in the lab. PCRPi-DB is freely available to the scientific community at http://www.bioinsilico.org/PCRPIDB. Besides browsing and querying the contents of the database, extensive documentation and links to relevant on-line resources and contents are available to users. PCRPi-DB is updated on a weekly basis
A holistic in silico approach to predict functional sites in protein structures
Abstract
Motivation: Proteins execute and coordinate cellular functions by interacting with other biomolecules. Among these interactions, protein–protein (including peptide-mediated), protein–DNA and protein–RNA interactions cover a wide range of critical processes and cellular functions. The functional characterization of proteins requires the description and mapping of functional biomolecular interactions and the identification and characterization of functional sites is an important step towards this end.
Results: We have developed a novel computational method, Multi-VORFFIP (MV), a tool to predicts protein-, peptide-, DNA- and RNA-binding sites in proteins. MV utilizes a wide range of structural, evolutionary, experimental and energy-based information that is integrated into a common probabilistic framework by means of a Random Forest ensemble classifier. While remaining competitive when compared with current methods, MV is a centralized resource for the prediction of functional sites and is interfaced by a powerful web application tailored to facilitate the use of the method and analysis of predictions to non-expert end-users.
Availability: http://www.bioinsilico.org/MVORFFIP
Supplementary information: Supplementary data are available at Bioinformatics online.
Contact: [email protected]; [email protected]</jats:p
Structural characteristics of novel protein folds
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region
Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams
BACKGROUND: Protein binding site prediction by computational means can yield valuable information that complements and guides experimental approaches to determine the structure of protein complexes. Predictions become even more relevant and timely given the current resolution of protein interaction maps, where there is a very large and still expanding gap between the available information on: (i) which proteins interact and (ii) how proteins interact. Proteins interact through exposed residues that present differential physicochemical properties, and these can be exploited to identify protein interfaces. RESULTS: Here we present VORFFIP, a novel method for protein binding site prediction. The method makes use of broad set of heterogeneous data and defined of residue environment, by means of Voronoi Diagrams that are integrated by a two-steps Random Forest ensemble classifier. Four sets of residue features (structural, energy terms, sequence conservation, and crystallographic B-factors) used in different combinations together with three definitions of residue environment (Voronoi Diagrams, sequence sliding window, and Euclidian distance) have been analyzed in order to maximize the performance of the method. CONCLUSIONS: The integration of different forms information such as structural features, energy term, evolutionary conservation and crystallographic B-factors, improves the performance of binding site prediction. Including the information of neighbouring residues also improves the prediction of protein interfaces. Among the different approaches that can be used to define the environment of exposed residues, Voronoi Diagrams provide the most accurate description. Finally, VORFFIP compares favourably to other methods reported in the recent literature
Computational Tools and Databases for the Study and Characterization of Protein Interactions.
Presaging critical residues in protein interfaces-web server (PCRPi-W):a web server to chart hot spots in protein interfaces
BACKGROUND: It is well established that only a portion of residues that mediate protein-protein interactions (PPIs), the so-called hot spot, contributes the most to the total binding energy, and thus its identification is an important and relevant question that has clear applications in drug discovery and protein design. The experimental identification of hot spots is however a lengthy and costly process, and thus there is an interest in computational tools that can complement and guide experimental efforts. PRINCIPAL FINDINGS: Here, we present Presaging Critical Residues in Protein interfaces-Web server (http://www.bioinsilico.org/PCRPi), a web server that implements a recently described and highly accurate computational tool designed to predict critical residues in protein interfaces: PCRPi. PRCPi depends on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs). CONCLUSIONS: PCRPi-W has been designed to provide an easy and convenient access to the broad scientific community. Predictions are readily available for download or presented in a web page that includes among other information links to relevant files, sequence information, and a Jmol applet to visualize and analyze the predictions in the context of the protein structure
On the mechanisms of protein interactions : predicting their affinity from unbound tertiary structures
Motivation:
The characterization of the protein–protein association mechanisms is crucial to understanding how biological processes occur. It has been previously shown that the early formation of non-specific encounters enhances the realization of the stereospecific (i.e. native) complex by reducing the dimensionality of the search process. The association rate for the formation of such complex plays a crucial role in the cell biology and depends on how the partners diffuse to be close to each other. Predicting the binding free energy of proteins provides new opportunities to modulate and control protein–protein interactions. However, existing methods require the 3D structure of the complex to predict its affinity, severely limiting their application to interactions with known structures.
Results:
We present a new approach that relies on the unbound protein structures and protein docking to predict protein–protein binding affinities. Through the study of the docking space (i.e. decoys), the method predicts the binding affinity of the query proteins when the actual structure of the complex itself is unknown. We tested our approach on a set of globular and soluble proteins of the newest affinity benchmark, obtaining accuracy values comparable to other state-of-art methods: a 0.4 correlation coefficient between the experimental and predicted values of ΔG and an error < 3 Kcal/mol.
Availability and implementation:
The binding affinity predictor is implemented and available at http://sbi.upf.edu/BADock and https://github.com/badocksbi/BADock
MetaPred2CS:A sequence-based meta-predictor for protein-protein interactions of prokaryotic two-component system proteins
Abstract
Motivation: Two-component systems (TCS) are the main signalling pathways of prokaryotes, and control a wide range of biological phenomena. Their functioning depends on interactions between TCS proteins, the specificity of which is poorly understood.
Results: The MetaPred2CS web-server interfaces a sequence-based meta-predictor specifically designed to predict pairing of the histidine kinase and response-regulator proteins forming TCSs. MetaPred2CS integrates six sequence-based methods using a support vector machine classifier and has been intensively tested under different benchmarking conditions: (i) species specific gene sets; (ii) neighbouring versus orphan pairs; and (iii) k-fold cross validation on experimentally validated datasets.
Availability and Implementation: Web server at: http://metapred2cs.ibers.aber.ac.uk/ , Source code: https://github.com/martinjvickers/MetaPred2CS or implemented as Virtual Machine at: http://metapred2cs.ibers.aber.ac.uk/download
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p
Atomic Analysis of Protein-Protein Interfaces with Known Inhibitors: The 2P2I Database
BACKGROUND: In the last decade, the inhibition of protein-protein interactions (PPIs) has emerged from both academic and private research as a new way to modulate the activity of proteins. Inhibitors of these original interactions are certainly the next generation of highly innovative drugs that will reach the market in the next decade. However, in silico design of such compounds still remains challenging. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe this particular PPI chemical space through the presentation of 2P2I(DB), a hand-curated database dedicated to the structure of PPIs with known inhibitors. We have analyzed protein/protein and protein/inhibitor interfaces in terms of geometrical parameters, atom and residue properties, buried accessible surface area and other biophysical parameters. The interfaces found in 2P2I(DB) were then compared to those of representative datasets of heterodimeric complexes. We propose a new classification of PPIs with known inhibitors into two classes depending on the number of segments present at the interface and corresponding to either a single secondary structure element or to a more globular interacting domain. 2P2I(DB) complexes share global shape properties with standard transient heterodimer complexes, but their accessible surface areas are significantly smaller. No major conformational changes are seen between the different states of the proteins. The interfaces are more hydrophobic than general PPI's interfaces, with less charged residues and more non-polar atoms. Finally, fifty percent of the complexes in the 2P2I(DB) dataset possess more hydrogen bonds than typical protein-protein complexes. Potential areas of study for the future are proposed, which include a new classification system consisting of specific families and the identification of PPI targets with high druggability potential based on key descriptors of the interaction. CONCLUSIONS: 2P2I database stores structural information about PPIs with known inhibitors and provides a useful tool for biologists to assess the potential druggability of their interfaces. The database can be accessed at http://2p2idb.cnrs-mrs.fr
- …
