Search CORE

167 research outputs found

Prediction of protein-protein interaction types using machine learning approaches

Author: Maleki Mina
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2014
Field of study

Prediction and analysis of protein-protein interactions (PPIs) is an important problem in life science research because of the fundamental roles of PPIs in many biological processes in living cells. One of the important problems surrounding PPIs is the identification and prediction of different types of complexes, which are characterized by properties such as type and numbers of proteins that interact, stability of the proteins, and also duration of the interactions. This thesis focuses on studying the temporal and stability aspects of the PPIs mostly using structural data. We have addressed the problem of predicting obligate and non-obligate protein complexes, as well as those aspects related to transient versus permanent because of the importance of non-obligate and transient complexes as therapeutic targets for drug discovery and development. We have presented a computational model to predict-protein interaction types using our proposed physicochemical features of desolvation and electrostatic energies and also structural and sequence domain-based features. To achieve a comprehensive comparison and demonstrate the strength of our proposed features to predict PPI types, we have also computed a wide range of previously used properties for prediction including physical features of interface area, chemical features of hydrophobicity and amino acid composition, physicochemical features of solvent-accessible surface area (SASA) and atomic contact vectors (ACV). After extracting the main features of the complexes, a variety of machine learning approaches have been used to predict PPI types. The prediction is performed via several state-of-the-art classification techniques, including linear dimensionality reduction (LDR), support vector machine (SVM), naive Bayes (NB) and k-nearest neighbor (k-NN). Moreover, several feature selection algorithms including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR) are applied on the available datasets to obtain more discriminative and relevant properties to distinguish between these two types of complexes Our computational results on different datasets confirm that using our proposed physicochemical features of desolvation and electrostatic energies lead to significant improvements on prediction performance. Moreover, using structural and sequence domains of CATH and Pfam and doing biological analysis help us to achieve a better insight on obligate and non-obligate complexes and their interactions

Scholarship at UWindsor

Prediction and analysis of protein-protein interaction types using short, linear motifs

Author: Pandit Manish Kumer
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2012
Field of study

Protein-protein interactions (PPIs) play a key role in many biological processes and functions in living cells. Hence, identification, prediction, and analysis of PPIs are important problems in molecular biology. Traditional solutions (laboratory based experiments) to this problem are labor intensive and time consuming. As a result, the demand of a computational model to solve this problem is increasing day by day. In this thesis, I propose a computational model to predict biological PPI types using short, linear motifs (SLiMs). The information contained in a protein sequence is retrieved using the profiles of SLiMs. I use sequence information as a distinguishing property between interactions types, mainly obligate and non-obligate. I also propose another model to predict PPIs using desolvation and electrostatic energies. These computational models use the information contained in the sequence, and desolvation and electrostatic energies of the protein complex as properties. After computing all the properties, the well-known classifiers, k -nearest neighbor (k -NN), support vector machine (SVM) and linear dimensionality reduction (LDR) have been implemented. Results on two well-known datasets confirm the accuracy of the models, which is above 99%. Analysis and comparison of the results show that the information contained in the sequence is very important for prediction and analysis of protein-protein interactions

Scholarship at UWindsor

A model to predict and analyze protein-protein interaction types using electrostatic energies

Author: Vasudev Gokul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2012
Field of study

Prediction and analysis of types of protein-protein interactions (PPI) is an important problem in molecular biology because of its key role in many biological processes in living cells. In this thesis, I propose a model called PPIEE (Protein-protein interaction using electrostatic energies) to predict and analyze protein interaction types using electrostatic energies as properties to distinguish between these types of interactions. This prediction approach uses electrostatic energies for pairs of atoms and amino acids present in interfaces where the interaction occurs. Using this approach, the results on well-known datasets confirms that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types. The classifiers used are support vector machines and linear dimensionality reduction. Since electrostatic interactions are long ranged, some other experiments are performed by changing the threshold values, which are the distances calculated between atom pairs of interacting chains, ranging from 7Å to 13Å. This information will be helpful for researchers to understand how different physiochemical properties contribute to understanding about stability of protein complexes and their function

Scholarship at UWindsor

The role of electrostatic energy in prediction of obligate protein-protein interactions

Author: Gokul Vasudev
Luis Rueda
Mina Maleki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Springer - Publisher Connector

Efficient comprehensive scoring of docked proteincomplexes - a machine learning approach

Author: Martin Oliver Sven
Publication venue
Publication date: 01/01/2006
Field of study

Biological systems and processes rely on a complex network of molecular interactions. The association of biological macromolecules is a fundamental biochemical phenomenon and an unsolved theoretical problem crucial for the understanding of complex living systems. The term protein-protein docking describes the computational prediction of the assembly of protein complexes from the individual subunits. Docking algorithms generally produce a large number of putative protein complexes. In most cases, some of these conformations resemble the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract the near-native structure(s) out of this considerably large pool of solutions, the so called scoring or ranking problem. It has been the aim of this work to develop methods for the efficient and accurate detection of near-native conformations in the scoring or ranking process of docked protein-protein complexes. A series of structural, chemical, biological and physical properties are used in this work to score docked protein-protein complexes. These properties include specialised energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established scoring functions are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. The specific complex classes are Enzyme-Inhibitor/Substrate complexes, Antibody-Antigen complexes and a third class denoted as "Other" complexes which holds all test cases not belonging to either of the two previous classes. The three complex class specific scoring functions were tested on the docking results of 99 complexes in their unbound form for the above mentioned categories. Defining success as scoring a 'true' result with a p-value of better than 0.1, the scoring schemes were found to be successful in 93%, 78% and 63% of the examined cases, respectively. The ranking of near-native structures can be drastically improved, leading to a significant enrichment of near-native complex conformations in the top ranks. It could be shown that the developed scoring schemes outperform five other previously published scoring functions

Kölner UniversitätsPublikationsServer

Stability of domain structures in multi-domain proteins

Author: A Bauer-Mehren
A Fernandez
A Hentati
A Pang
A Schlicker
AG Murzin
AS Kondrashov
AW Munro
BB Kragelund
BM Broome
C Chothia
C Chothia
CA Gough
CJ Camacho
D Ekman
DA Di Giusto
DF Burke
DP Grandgenett
DR Caffrey
E Capriotti
E Krissinel
ED Levy
F Ali-Osman
F Dong
G Apic
GC Conant
H Wohlrab
HP Shanahan
HW He
HW He
J Karanicolas
J Rodriguez-Lopez
J Schymkowitz
J Weiner 3rd
JH Fong
JH Han
K Vlahovicek
L Riechmann
M Ryan
MA DePristo
MM Gromiha
N Tokuriki
N Tokuriki
NO Stitziel
O Keskin
PA Ory
Q Wang
R Guerois
R Rajasekaran
R Zhou
RJ Dobson
S Gong
S Teng
SJ Hamill
SO Yesylevskyy
T Tanaka
Y Xia
Z Liu
Publication venue: Nature Publishing Group
Publication date: 18/07/2011
Field of study

Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR

Crossref

PubMed Central

Open Access Repository of IISc Research Publications

Recommended from our members

Molecular characterization and evolutionary plasticity of protein-protein interfaces

Author: Bickerton George Richard James
Publication venue: University of Cambridge
Publication date: 01/01/2010
Field of study

Abstract The sequencing of the human genome provides the parts list for understanding cellular processes. However, as 70% of eukaryotic genes work through multi-protein systems, it is only through detailed study of the interactions of these components, that a more complete, systems-level understanding can be gained. This thesis is centred on the establishment of PICCOLO - a comprehensive database of structurally characterized protein interactions. In generating the resource, issues of interface definition, quaternary structure, data redundancy, structural environment and interaction type are addressed. The resource enables a variety of analyses to be performed concerning interface properties including residue propensity, hydropathy, polarity, interface size, sequence entropy and residue contact preference. PICCOLO has been applied to probing the patterns of substitutions that are accepted in protein interfaces across evolution, and whether these patterns are distinguishable from those seen in other structural environments. The derivation of a high-quality set of multiple structural alignments in the form of the database TOCCATA, a prerequisite for such analysis, is described, as well as procedures to derive environment-specific substitution tables. The Blundell group has contributed a series of methods to predict the likely effect of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on protein stability, function and interactions in order to triage the large volumes of data created from high-throughput genetic screening studies, enabling prioritization of those nsSNPs most likely to be phenotypically detrimental. PICCOLO's contribution to these predictions is described. Historically there has been little focus on protein-protein interactions as drug targets for small-molecule therapeutics. However, alanine-scanning mutagenesis studies have revealed that only a subset of residues contribute the greater part of free energy to binding - so-called "hot-spots". Molecular characterization of hot-spots performed using PICCOLO, probes the molecular basis underlying this important phenomenon leading to the possibility of predictive methods to identify hot-spots 'in silico'

Apollo (Cambridge)

OpenGrey Repository

Molecular recognition and partner prediction for transient protein complexes: CDK-cyclin homologue interactions

Author: Quan Xueping
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Edinburgh Research Archive

How structural adaptability exists alongside HLA-A2 bias in the human alphabeta TCR repertoire

Author: Baker Brian M.
Blevins Sydney J.
Nishimura Michael I.
Pierce Brian G.
Riley Timothy P.
Singh Nishant K.
Spear Timothy T.
Wang Yuan
Weng Zhiping
Publication venue: eScholarship@UMassChan
Publication date: 01/03/2016
Field of study

How T-cell receptors (TCRs) can be intrinsically biased toward MHC proteins while simultaneously display the structural adaptability required to engage diverse ligands remains a controversial puzzle. We addressed this by examining alphabeta TCR sequences and structures for evidence of physicochemical compatibility with MHC proteins. We found that human TCRs are enriched in the capacity to engage a polymorphic, positively charged hot-spot region that is almost exclusive to the alpha1-helix of the common human class I MHC protein, HLA-A*0201 (HLA-A2). TCR binding necessitates hot-spot burial, yielding high energetic penalties that must be offset via complementary electrostatic interactions. Enrichment of negative charges in TCR binding loops, particularly the germ-line loops encoded by the TCR Valpha and Vbeta genes, provides this capacity and is correlated with restricted positioning of TCRs over HLA-A2. Notably, this enrichment is absent from antibody genes. The data suggest a built-in TCR compatibility with HLA-A2 that biases receptors toward, but does not compel, particular binding modes. Our findings provide an instructional example for how structurally pliant MHC biases can be encoded within TCRs

eScholarship@UMMS

PIER: protein interface recognition for structural proteomics

Author: Eugene Raush
Irina Kufareva
Levon Budagyan
Maxim Totrov
Ruben Abagyan
Publication venue
Publication date: 01/01/2007
Field of study

Abstract Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments (MSA) projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, we developed an improved method for predicting interfaces from a single protein structure, that is based on local statistical properties of the protein surface derived at the level of atomic groups. It was also demonstrated that the evolutionary conservation signal only marginally influenced the overall prediction performance on a diverse benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. The proposed Protein IntErface Recognition method (PIER) yielded improved performance as compared to several alignment-free or alignment-dependent predictions. PIER achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a benchmark of 490 homodimeric, 62 heterodimeric and 196 transient interfaces. For 696 of 748 proteins (93%) the binding patch residues were successfully detected with precision exceeding 25% at 50% recall; for 524 proteins (70%) the corresponding precision was above 50%. The calculation only took seconds for an average 300-residue protein. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects

CiteSeerX