167 research outputs found

    Prediction of protein-protein interaction types using machine learning approaches

    Get PDF
    Prediction and analysis of protein-protein interactions (PPIs) is an important problem in life science research because of the fundamental roles of PPIs in many biological processes in living cells. One of the important problems surrounding PPIs is the identification and prediction of different types of complexes, which are characterized by properties such as type and numbers of proteins that interact, stability of the proteins, and also duration of the interactions. This thesis focuses on studying the temporal and stability aspects of the PPIs mostly using structural data. We have addressed the problem of predicting obligate and non-obligate protein complexes, as well as those aspects related to transient versus permanent because of the importance of non-obligate and transient complexes as therapeutic targets for drug discovery and development. We have presented a computational model to predict-protein interaction types using our proposed physicochemical features of desolvation and electrostatic energies and also structural and sequence domain-based features. To achieve a comprehensive comparison and demonstrate the strength of our proposed features to predict PPI types, we have also computed a wide range of previously used properties for prediction including physical features of interface area, chemical features of hydrophobicity and amino acid composition, physicochemical features of solvent-accessible surface area (SASA) and atomic contact vectors (ACV). After extracting the main features of the complexes, a variety of machine learning approaches have been used to predict PPI types. The prediction is performed via several state-of-the-art classification techniques, including linear dimensionality reduction (LDR), support vector machine (SVM), naive Bayes (NB) and k-nearest neighbor (k-NN). Moreover, several feature selection algorithms including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR) are applied on the available datasets to obtain more discriminative and relevant properties to distinguish between these two types of complexes Our computational results on different datasets confirm that using our proposed physicochemical features of desolvation and electrostatic energies lead to significant improvements on prediction performance. Moreover, using structural and sequence domains of CATH and Pfam and doing biological analysis help us to achieve a better insight on obligate and non-obligate complexes and their interactions

    Prediction and analysis of protein-protein interaction types using short, linear motifs

    Get PDF
    Protein-protein interactions (PPIs) play a key role in many biological processes and functions in living cells. Hence, identification, prediction, and analysis of PPIs are important problems in molecular biology. Traditional solutions (laboratory based experiments) to this problem are labor intensive and time consuming. As a result, the demand of a computational model to solve this problem is increasing day by day. In this thesis, I propose a computational model to predict biological PPI types using short, linear motifs (SLiMs). The information contained in a protein sequence is retrieved using the profiles of SLiMs. I use sequence information as a distinguishing property between interactions types, mainly obligate and non-obligate. I also propose another model to predict PPIs using desolvation and electrostatic energies. These computational models use the information contained in the sequence, and desolvation and electrostatic energies of the protein complex as properties. After computing all the properties, the well-known classifiers, k -nearest neighbor (k -NN), support vector machine (SVM) and linear dimensionality reduction (LDR) have been implemented. Results on two well-known datasets confirm the accuracy of the models, which is above 99%. Analysis and comparison of the results show that the information contained in the sequence is very important for prediction and analysis of protein-protein interactions

    A model to predict and analyze protein-protein interaction types using electrostatic energies

    Get PDF
    Prediction and analysis of types of protein-protein interactions (PPI) is an important problem in molecular biology because of its key role in many biological processes in living cells. In this thesis, I propose a model called PPIEE (Protein-protein interaction using electrostatic energies) to predict and analyze protein interaction types using electrostatic energies as properties to distinguish between these types of interactions. This prediction approach uses electrostatic energies for pairs of atoms and amino acids present in interfaces where the interaction occurs. Using this approach, the results on well-known datasets confirms that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types. The classifiers used are support vector machines and linear dimensionality reduction. Since electrostatic interactions are long ranged, some other experiments are performed by changing the threshold values, which are the distances calculated between atom pairs of interacting chains, ranging from 7Å to 13Å. This information will be helpful for researchers to understand how different physiochemical properties contribute to understanding about stability of protein complexes and their function

    Efficient comprehensive scoring of docked proteincomplexes - a machine learning approach

    Get PDF
    Biological systems and processes rely on a complex network of molecular interactions. The association of biological macromolecules is a fundamental biochemical phenomenon and an unsolved theoretical problem crucial for the understanding of complex living systems. The term protein-protein docking describes the computational prediction of the assembly of protein complexes from the individual subunits. Docking algorithms generally produce a large number of putative protein complexes. In most cases, some of these conformations resemble the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract the near-native structure(s) out of this considerably large pool of solutions, the so called scoring or ranking problem. It has been the aim of this work to develop methods for the efficient and accurate detection of near-native conformations in the scoring or ranking process of docked protein-protein complexes. A series of structural, chemical, biological and physical properties are used in this work to score docked protein-protein complexes. These properties include specialised energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established scoring functions are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. The specific complex classes are Enzyme-Inhibitor/Substrate complexes, Antibody-Antigen complexes and a third class denoted as "Other" complexes which holds all test cases not belonging to either of the two previous classes. The three complex class specific scoring functions were tested on the docking results of 99 complexes in their unbound form for the above mentioned categories. Defining success as scoring a 'true' result with a p-value of better than 0.1, the scoring schemes were found to be successful in 93%, 78% and 63% of the examined cases, respectively. The ranking of near-native structures can be drastically improved, leading to a significant enrichment of near-native complex conformations in the top ranks. It could be shown that the developed scoring schemes outperform five other previously published scoring functions

    Stability of domain structures in multi-domain proteins

    Get PDF
    Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR

    How structural adaptability exists alongside HLA-A2 bias in the human alphabeta TCR repertoire

    Get PDF
    How T-cell receptors (TCRs) can be intrinsically biased toward MHC proteins while simultaneously display the structural adaptability required to engage diverse ligands remains a controversial puzzle. We addressed this by examining alphabeta TCR sequences and structures for evidence of physicochemical compatibility with MHC proteins. We found that human TCRs are enriched in the capacity to engage a polymorphic, positively charged hot-spot region that is almost exclusive to the alpha1-helix of the common human class I MHC protein, HLA-A*0201 (HLA-A2). TCR binding necessitates hot-spot burial, yielding high energetic penalties that must be offset via complementary electrostatic interactions. Enrichment of negative charges in TCR binding loops, particularly the germ-line loops encoded by the TCR Valpha and Vbeta genes, provides this capacity and is correlated with restricted positioning of TCRs over HLA-A2. Notably, this enrichment is absent from antibody genes. The data suggest a built-in TCR compatibility with HLA-A2 that biases receptors toward, but does not compel, particular binding modes. Our findings provide an instructional example for how structurally pliant MHC biases can be encoded within TCRs

    PIER: protein interface recognition for structural proteomics

    Get PDF
    Abstract Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments (MSA) projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, we developed an improved method for predicting interfaces from a single protein structure, that is based on local statistical properties of the protein surface derived at the level of atomic groups. It was also demonstrated that the evolutionary conservation signal only marginally influenced the overall prediction performance on a diverse benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. The proposed Protein IntErface Recognition method (PIER) yielded improved performance as compared to several alignment-free or alignment-dependent predictions. PIER achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a benchmark of 490 homodimeric, 62 heterodimeric and 196 transient interfaces. For 696 of 748 proteins (93%) the binding patch residues were successfully detected with precision exceeding 25% at 50% recall; for 524 proteins (70%) the corresponding precision was above 50%. The calculation only took seconds for an average 300-residue protein. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects
    corecore