2,117 research outputs found
Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction
Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems
A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a
predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the
Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in
Computational Biology.Peer ReviewedPostprint (author's final draft
Computational prediction and analysis of macromolecular interactions
Protein interactions regulate gene expression, cell signaling, catalysis, and many other functions across all of molecular biology. We must understand them quantitatively, and experimental methods have provided the data that form the basis of our current understanding. They remain our most accurate tools. However, their low efficiency and high cost leave room for predictive, computational approaches that can provide faster and more detailed answers to biological problems. A rigid-body simulation can quickly and effectively calculate the predicted interaction energy between two molecular structures in proximity. The fast Fourier-transform-based mapping algorithm FTMap predicts small molecule binding 'hot spots' on a protein's surface and can provide likely orientations of specific ligands of interest that may occupy those hot spots. This process now allows unique ligands to be used by this algorithm while permitting additional small molecular cofactors to remain in their bound conformation. By keeping the cofactors bound, FTMap can reduce false positives where the algorithm identifies a true, but incorrect, ligand pocket where the known cofactor already binds. A related algorithm, ClusPro, can evaluate interaction energies for billions of docked conformations of macromolecular structures. The work reported in this thesis can predict protein-polysaccharide interactions and the software now contains a publicly available feature for predicting protein-heparin interactions. In addition, a new approach for determining regions of predicted activity on a protein's surface allows prediction of a protein-protein interface. This new tool can also identify the interface in encounter complexes formed by the process of protein association—more closely resembling the biological nature of the interaction than the former, calculated, binary, bound and unbound states
Predicting protein interface residues using easily accessible on-line resources
© The Author 2015. Published by Oxford University Press. It has beenmore than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we re- view 10methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experi- mental structures and high-quality homology models, structure-basedmethods outperformthose using only protein se- quences, with global template-based approaches providing the best performance. Formoderate-qualitymodels, sequence- basedmethods often performbetter than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in severalmethods quantitatively improve the results only for experimental struc- tures, suggesting that these procedures should be tuned up for computer-generatedmodels. Finally, we anticipate that advancedmeta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improve- ments, easily accessible web servers already provide the scientific community with convenient resources for the identifica- tion of protein-protein interaction sites
Improving protein docking with binding site prediction
Protein-protein and protein-ligand interactions are fundamental as many proteins mediate their biological function through these interactions. Many important applications follow directly from the identification of residues in the interfaces between protein-protein and protein-ligand interactions, such as drug design, protein mimetic engineering, elucidation of molecular pathways, and understanding of disease mechanisms. The identification of interface residues can also guide the docking process to build the structural model of protein-protein complexes. This dissertation focuses on developing computational approaches for protein-ligand and protein-protein binding site prediction and applying these predictions to improve protein-protein docking. First, we develop an automated approach LIGSITEcs to predict protein-ligand binding site, based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. We compare our algorithm to four other approaches, LIGSITE, CAST, PASS, and SURFNET, and evaluate all on a dataset of 48 unbound/bound structures and 210 bound-structures. LIGSITEcs performs slightly better than the other tools and achieves a success rate of 71% and 75%, respectively. Second, for protein-protein binding site, we develop metaPPI, a meta server for interface prediction. MetaPPI combines results from a number of tools, such as PPI_Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. After refinement, metaPPI significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. Third, for protein-protein docking, we develop a FFT-based docking algorithm and system BDOCK, which includes specific scoring functions for specific types of complexes. BDOCK uses family-based residue interface propensities as a scoring function and obtains improvement factors of 4-30 for enzyme-inhibitor and 4-11 for antibody-antigen complexes in two specific SCOP families. Furthermore, the degrees of buriedness of surface residues are integrated into BDOCK, which improves the shape discriminator for enzyme-inhibitor complexes. The predicted interfaces from metaPPI are integrated as well, either during docking or after docking. The evaluation results show that reliable interface predictions improve the discrimination between near-native solutions and false positive. Finally, we propose an implicit method to deal with the flexibility of proteins by softening the surface, to improve docking for non enzyme-inhibitor complexes
Optimizing Data Selection for Contact Prediction in Proteins
Proteins are essential to life across all organisms. They act as enzymes, antibodies, transporters
of molecules, structural elements, among other important roles. Their ability to
interact with specific molecules in a selective manner, is what makes them important.
Being able to understand their interaction can provide many advantages in fields such
as drug design and metabolic engineering. Current methods of predicting protein interaction
attempt to geometrically fit the structures of two proteins together by generating
a large amount of potential configurations and then discriminating the correct pose from
the remaining ones.
Given the large search space, approaches to reduce the complexity are often employed.
Identifying a contact point between the pairing proteins is a good constraining factor. If
at least one contact can be predicted among a small set of possibilities (e.g. 100), the
search space will be significantly reduced.
Using structural and evolutionary information of the interacting proteins, a machine
learning predictor can be developed for this task. Such evolutionary measures are computed
over a substantial amount of homologous sequences, which can be filtered and
ordered in many different ways. As a result, a machine learning solution was developed
that focused in measuring the effects that differing homolog arrangements can have over
the final prediction
Protein docking prediction using predicted protein-protein interface
<p>Abstract</p> <p>Background</p> <p>Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations.</p> <p>Results</p> <p>We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering.</p> <p>Conclusion</p> <p>We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.</p
Recommended from our members
Predicting multibody assembly of proteins
textThis thesis addresses the multi-body assembly (MBA) problem in the context of protein assemblies. [...] In this thesis, we chose the protein assembly domain because accurate and reliable computational modeling, simulation and prediction of such assemblies would clearly accelerate discoveries in understanding of the complexities of metabolic pathways, identifying the molecular basis for normal health and diseases, and in the designing of new drugs and other therapeutics. [...] [We developed] F²Dock (Fast Fourier Docking) which includes a multi-term function which includes both a statistical thermodynamic approximation of molecular free energy as well as several of knowledge-based terms. Parameters of the scoring model were learned based on a large set of positive/negative examples, and when tested on 176 protein complexes of various types, showed excellent accuracy in ranking correct configurations higher (F² Dock ranks the correcti solution as the top ranked one in 22/176 cases, which is better than other unsupervised prediction software on the same benchmark). Most of the protein-protein interaction scoring terms can be expressed as integrals over the occupied volume, boundary, or a set of discrete points (atom locations), of distance dependent decaying kernels. We developed a dynamic adaptive grid (DAG) data structure which computes smooth surface and volumetric representations of a protein complex in O(m log m) time, where m is the number of atoms assuming that the smallest feature size h is [theta](r[subscript max]) where r[subscript max] is the radius of the largest atom; updates in O(log m) time; and uses O(m)memory. We also developed the dynamic packing grids (DPG) data structure which supports quasi-constant time updates (O(log w)) and spherical neighborhood queries (O(log log w)), where w is the word-size in the RAM. DPG and DAG together results in O(k) time approximation of scoring terms where k << m is the size of the contact region between proteins. [...] [W]e consider the symmetric spherical shell assembly case, where multiple copies of identical proteins tile the surface of a sphere. Though this is a restricted subclass of MBA, it is an important one since it would accelerate development of drugs and antibodies to prevent viruses from forming capsids, which have such spherical symmetry in nature. We proved that it is possible to characterize the space of possible symmetric spherical layouts using a small number of representative local arrangements (called tiles), and their global configurations (tiling). We further show that the tilings, and the mapping of proteins to tilings on arbitrary sized shells is parameterized by 3 discrete parameters and 6 continuous degrees of freedom; and the 3 discrete DOF can be restricted to a constant number of cases if the size of the shell is known (in terms of the number of protein n). We also consider the case where a coarse model of the whole complex of proteins are available. We show that even when such coarse models do not show atomic positions, they can be sufficient to identify a general location for each protein and its neighbors, and thereby restricts the configurational space. We developed an iterative refinement search protocol that leverages such multi-resolution structural data to predict accurate high resolution model of protein complexes, and successfully applied the protocol to model gp120, a protein on the spike of HIV and currently the most feasible target for anti-HIV drug design.Computer Science
- …