Search CORE

3,194 research outputs found

Using structural motif descriptors for sequence-based binding site prediction

Author: A Aytuna
A Bairoch
A Bateman
A Bateman
A Koike
A Stein
AG Murzin
AHY Tong
AJ Walhout
Andreas Henschel
C Sander
Christof Winter
CM Deane
DR Caffrey
EM Zdobnov
FP Davis
GE Crooks
H Li
HB Fraser
HM Berman
J Espadaler
J Sun
JC Obenauer
JR Bradford
JW Torrance
M Ashburner
M Pellegrini
Michael Schroeder
MY Galperin
O Keskin
O Lichtarge
P Aloy
P Aloy
P Scordis
RC Edgar
SR Eddy
TL Bailey
Wan Kyu Kim
WK Kim
WK Kim
WN Grundy
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

All authors are with the Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany and -- Wan Kyu Kim is with the Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USABackground: Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites. -- Results: We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern. -- Conclusion: The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.Institute for Cellular and Molecular [email protected]

Crossref

Springer - Publisher Connector

PubMed Central

Texas ScholarWorks

Kernel-based machine learning protocol for predicting DNA-binding proteins

Author: Bhardwaj Nitin
Langlois Robert E.
Lu Hui
Zhao Guijun
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

DNA-binding proteins (DNA-BPs) play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Attempts have been made to identify DNA-BPs based on their sequence and structural information with moderate accuracy. Here we develop a machine learning protocol for the prediction of DNA-BPs where the classifier is Support Vector Machines (SVMs). Information used for classification is derived from characteristics that include surface and overall composition, overall charge and positive potential patches on the protein surface. In total 121 DNA-BPs and 238 non-binding proteins are used to build and evaluate the protocol. In self-consistency, accuracy value of 100% has been achieved. For cross-validation (CV) optimization over entire dataset, we report an accuracy of 90%. Using leave 1-pair holdout evaluation, the accuracy of 86.3% has been achieved. When we restrict the dataset to less than 20% sequence identity amongst the proteins, the holdout accuracy is achieved at 85.8%. Furthermore, seven DNA-BPs with unbounded structures are all correctly predicted. The current performances are better than results published previously. The higher accuracy value achieved here originates from two factors: the ability of the SVM to handle features that demonstrate a wide range of discriminatory power and, a different definition of the positive patch. Since our protocol does not lean on sequence or structural homology, it can be used to identify or predict proteins with DNA-binding function(s) regardless of their homology to the known ones

CiteSeerX

Crossref

PubMed Central

Recommended from our members

Statistical deconvolution of enthalpic energetic contributions to MHC-peptide binding affinity

Author: Davies M.N.
Drew M.G.B.
Flower D.R.
Hattotuwagama C.K.
Moss David S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: MHC Class I molecules present antigenic peptides to cytotoxic T cells, which forms an integral part of the adaptive immune response. Peptides are bound within a groove formed by the MHC heavy chain. Previous approaches to MHC Class I-peptide binding prediction have largely concentrated on the peptide anchor residues located at the P2 and C-terminus positions. Results: A large dataset comprising MHC-peptide structural complexes was created by re-modelling pre-determined x-ray crystallographic structures. Static energetic analysis, following energy minimisation, was performed on the dataset in order to characterise interactions between bound peptides and the MHC Class I molecule, partitioning the interactions within the groove into van der Waals, electrostatic and total non-bonded energy contributions. Conclusion: The QSAR techniques of Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS) algorithms were used to identify key interactions between the two molecules by comparing the calculated energy values with experimentally-determined BL50 data. Although the peptide termini binding interactions help ensure the stability of the MHC Class I-peptide complex, the central region of the peptide is also important in defining the specificity of the interaction. As thermodynamic studies indicate that peptide association and dissociation may be driven entropically, it may be necessary to incorporate entropic contributions into future calculations

Central Archive at the University of Reading

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Birkbeck Institutional Research Online

A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

Author: Henschel Andreas
Publication venue: Technische Universität Dresden
Publication date: 17/10/2008
Field of study

BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of &quot;ATP-binding&quot;, where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

Technische Universität Dresden: Qucosa

Pockets as structural descriptors of EGFR kinase conformations

Author: Barletta Roldan Patricio German
Fernández Alberti Sebastián
Fornasari Maria Silvina
Hasenahuer Marcia Anahí
Parisi Gustavo Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2017
Field of study

Epidermal Growth Factor Receptor (EGFR), a tyrosine kinase receptor, is one of the main tumor markers in different types of cancers. The kinase native state is mainly composed of two populations of conformers: active and inactive. Several sequence variations in EGFR kinase region promote the differential enrichment of conformers with higher activity. Some structural characteristics have been proposed to differentiate kinase conformations, but these considerations could lead to ambiguous classifications. We present a structural characterisation of EGFR kinase conformers, focused on active site pocket comparisons, and the mapping of known pathological sequence variations. A structural based clustering of this pocket accurately discriminates active from inactive, well-characterised conformations. Furthermore, this main pocket contains, or is in close contact with, ≈65% of cancer-related variation positions. Although the relevance of protein dynamics to explain biological function has been extensively recognised, the usage of the ensemble of conformations in dynamic equilibrium to represent the functional state of proteins and the importance of pockets, cavities and/or tunnels was often neglected in previous studies. These functional structures and the equilibrium between them could be structurally analysed in wild type as well as in sequence variants. Our results indicate that biologically important pockets, as well as their shape and dynamics, are central to understanding protein function in wild-type, polymorphic or disease-related variations.Fil: Hasenahuer, Marcia Anahí. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Barletta Roldan, Patricio German. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Fernández Alberti, Sebastián. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Parisi, Gustavo Daniel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Fornasari, Maria Silvina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach

Author: Cao ZW
Chen YZ
Han LY
Lin HH
Xie B
Zhang HL
Zheng CJ
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Metal-binding proteins play important roles in structural stability, signaling, regulation, transport, immune response, metabolism control, and metal homeostasis. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting metal-binding proteins irrespective of sequence similarity. This work explores support vector machines (SVM) as such a method. SVM prediction systems were developed by using 53,333 metal-binding and 147,347 non-metal-binding proteins, and evaluated by an independent set of 31,448 metal-binding and 79,051 non-metal-binding proteins. The computed prediction accuracy is 86.3%, 81.6%, 83.5%, 94.0%, 81.2%, 85.4%, 77.6%, 90.4%, 90.9%, 74.9% and 78.1% for calcium-binding, cobalt-binding, copper-binding, iron-binding, magnesium-binding, manganese-binding, nickel-binding, potassium-binding, sodium-binding, zinc-binding, and all metal-binding proteins respectively. The accuracy for the non-member proteins of each class is 88.2%, 99.9%, 98.1%, 91.4%, 87.9%, 94.5%, 99.2%, 99.9%, 99.9%, 98.0%, and 88.0% respectively. Comparable accuracies were obtained by using a different SVM kernel function. Our method predicts 67% of the 87 metal-binding proteins non-homologous to any protein in the Swissprot database and 85.3% of the 333 proteins of known metal-binding domains as metal-binding. These suggest the usefulness of SVM for facilitating the prediction of metal-binding proteins. Our software can be accessed at the SVMProt server

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Pemodelan Molekular Enzim 3β-Hydroxysteroid Dehydrogenase Tipe 2: Pemodelan Kombinasi Homologi, Docking dan Pendekatan QSAR

Author: NOEGROHATI SRI
SUDARMANTO BAMBANG SULISTYO ARI
SUSIDARTI RATNA ASMAH
YUSWANTO AGUSTINUS
Publication venue: Faculty of Pharmacy, Universitas Pancasila
Publication date: 30/04/2017
Field of study

A homology model of human 3β-HSD type 2 has been developed from homology modeling techniques using Phyre2 server and refi ned by ModRefi ner. The PROCHECK, QMEAN and ProSA-web online tools were carried out to evaluate the stereochemical quality of the model. The Ramachandran plot resulted from PROCHECK showed that 84.5% residues are in the most favored region, 13.7% are in the additional allowed region, 1.5% are in the generously allowed region and 0.3% are in the disallowed region. The QMEAN (Z-score) are 0.509 (-3.006) and Z-score of ProSA-web is -7.10. The negative values of protein fold energies also found in almost all sequences. Furthermore, molecular docking was also applied to validate the model using MOE. The hydrogen bonding interactions with Tyr154, Ser124, and Ser218 are found in all docked substrates as well as known inhibitors (trilostane and epostane). A dataset of azasteroid inhibitors were also docked into the substrate active site of human 3β-HSD2. These docked structures were utilized to construct corresponding docking-based QSAR equation by employing genetic algorithm (GA) statistical analysis. The contructed best QSAR equation has a robust predictive power according to its statistical parameters, hence may be applied to supersede the default scoring function provided by docking software. These results indicate that the human 3β-HSD2 model was successfully evaluated as a good model.Model homologi dari enzim 3β-HSD2 telah dikonstruksi menggunakan server Phyre2 dan dilanjutkan dengan ModRefi ner. Piranti lunak daring PROCHECK, QMEAN dan ProSA-web digunakan untuk mengevaluasi kualitas model stereokimia. Plot Ramachandran yang dihasilkan dari PROCHECK menunjukkan bahwa 84,5% residu berada di most favored region, 13,7% di additional allowed region, 1,5% di generously allowed region dan 0,3% di dissallowed region. Nilai QMEAN (Z-score) adalah 0,509 (-3,006) dan Z-score dari ProSA-web adalah -7,10. Nilai negatif pada energi folding protein juga ditemukan di hampir seluruh sekuens. Selanjutnya, penambatan molekuler juga diterapkan untuk memvalidasi model menggunakan program MOE. Interaksi ikatan hidrogen dengan Tyr154, Ser124 dan Ser218 ditemukan disemua substrat yang ditambatkan, seperti halnya di senyawa-senyawa inhibitor yang telah dikenal (trilostane dan epostane). Dataset inhibitor azasteroid juga ditambatkan ke situs aktif substrat pada enzim 3β-HSD2. Struktur yang tertambatkan digunakan untuk membangun persamaan QSAR berbasis penambatan molekuler dengan menerapkan analisis statistik genetic algorithm (GA). Persamaan QSAR terbaik yang terkonstruksi memiliki daya prediksi yang kuat sesuai dengan parameter statistiknya, sehingga dapat diaplikasikan untuk menggantikan fungsi scoring default yang disediakan oleh program MOE. Hasil ini menunjukkan bahwa model enzim 3β-HSD2 manusia berhasil dievaluasi sebagai model yang baik

JURNAL ILMU KEFARMASIAN INDONESIA