Search CORE

19 research outputs found

Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective

Author: A Fernández
AA Bogan
CJ Tsai
E Guney
F Glaser
G Moont
H Ponstingl
H Zhu
I Halperin
IM Nooren
ISS Moreira
J Li
J Li
J Martin
J Mintseris
Jinyan Li
JL Morrison
KS Thorn
L Lo Conte
N Tuncbag
O Keskin
P Chakrabarti
P Privalov
Q Liu
Qian Liu
RP Bahadur
RP Bahadur
RP Saha
S De
S Jones
S Lukman
S Miyazawa
SJ Hubbard
T Clackson
T Pupko
WL DeLano
WSJ Valdar
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A protein binding hot spot is a small cluster of residues tightly packed at the center of the interface between two interacting proteins. Though a hot spot constitutes a small fraction of the interface, it is vital to the stability of protein complexes. Recently, there are a series of hypotheses proposed to characterize binding hot spots, including the pioneering O-ring theory, the insightful 'coupling' and 'hot region' principle, and our 'double water exclusion' (DWE) hypothesis. As the perspective changes from the O-ring theory to the DWE hypothesis, we examine the physicochemical properties of the binding hot spots under the new hypothesis and compare with those under the O-ring theory. Results The requirements for a cluster of residues to form a hot spot under the DWE hypothesis can be mathematically satisfied by a biclique subgraph if a vertex is used to represent a residue, an edge to indicate a close distance between two residues, and a bipartite graph to represent a pair of interacting proteins. We term these hot spots as DWE bicliques. We identified DWE bicliques from crystal packing contacts, obligate and non-obligate interactions. Our comparative study revealed that there are abundant <it>unique </it>bicliques to the biological interactions, indicating specific biological binding behaviors in contrast to crystal packing. The two sub-types of biological interactions also have their own signature bicliques. In our analysis on residue compositions and residue pairing preferences in DWE bicliques, the focus was on interaction-preferred residues (ipRs) and interaction-preferred residue pairs (ipRPs). It is observed that hydrophobic residues are heavily involved in the ipRs and ipRPs of the obligate interactions; and that aromatic residues are in favor in the ipRs and ipRPs of the biological interactions, especially in those of the non-obligate interactions. In contrast, the ipRs and ipRPs in crystal packing are dominated by hydrophilic residues, and most of the anti-ipRs of crystal packing are the ipRs of the obligate or non-obligate interactions. Conclusions These ipRs and ipRPs in our DWE bicliques describe a diverse binding features among the three types of interactions. They also highlight the specific binding behaviors of the biological interactions, sharply differing from the artifact interfaces in the crystal packing. It can be noted that DWE bicliques, especially the unique bicliques, can capture deep insights into the binding characteristics of protein interfaces.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central

Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling

Author: A Armon
A Prlic
Alessandra Carbone
BW Matthews
CA Innis
CA Innis
CJ Tsai
CT Porter
DR Caffrey
E Kanamori
ELL Sonnhammer
G Cheng
GH Gonnet
H Chen
I Mihalek
JA Studier
JR Bradford
Ladislas A. Trojan
Michael Levitt
O Lichtarge
O Lichtarge
P Chakrabarti
Richard Lavery
RP Bahadur
S Henikoff
S Jones
S Madabushi
S Miller
SF Altschul
SJ Hubbard
Sophie Sacquin-Mora
SS Negi
Stefan Engelen
T Pupko
W Humphrey
WSJ Valdar
Y Ofran
Y Ofran
ZJ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

Author: A Andreeva
A Gutteridge
AH Elcock
AR Panchenko
B Lee
B Rost
BW Mathews
CA Innis
Cathy H Wu
CH Wu
DK Smith
GJ Bartlett
H Yao
HM Berman
IH Witten
JC Platt
JD Thompson
JS Milton
K Kinoshita
K Sjolander
M Ota
MA Hearst
MJ Ondrechen
Natalia V Petrova
O Lichtarge
P Aloy
PP Wangikar
R Kohavi
R Koradi
R Landgraf
RL Tatusov
S Chakravarty
S Jones
S Parthasarathy
S Zhu
SF Altschul
SJ Campbell
SJ Hubbard
TA Binkowski
W Kabsch
W Tian
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Combining specificity determining and conserved residues improves functional site prediction

Author: A Carro
A del Sol Mesa
A Shulman-Peleg
A Stark
A Stark
A Teplyakov
AE Todd
ATR Laurie
B Ma
B Mirkin
B Reva
B Zambelli
BJ Polacco
C Romier
C Yeats
CT Porter
DA Rodionov
EA Gaucher
G Dodson
G Koczyk
G Wu
GJ Kleywegt
H Yao
IM Wallace
IN Shindyalov
J Capra
J Dundas
J Pei
J-M Chandonia
JA Capra
JE Donald
JR Manning
K Ye
K Ye
KA Feenstra
KM Mayer
L Aravind
L Holm
LA Mirny
M Hendlich
M Landau
MA Willis
Mikhail S Gelfand
O Lichtarge
Olga V Kalinina
OV Kalinina
OV Kalinina
P Aloy
PP Khil
PP Khil
R Landgraf
RD Finn
RJ Edwards
Robert B Russell
S Ahmad
S Chakrabarti
S Sankararaman
S Whelan
SS Hannenhalli
T Maier
T Pupko
WR Taylor
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Assessment of protein-protein interfaces in cryo-EM derived assemblies

Author: A Kryshtafovych
A Kryshtafovych
A Patwardhan
AA Bogan
AJ McCoy
AP Joseph
AP Joseph
AP Joseph
AP Pandurangan
BA Barad
C Yan
D Guzenko
D Russel
DR Caffrey
E Cukuroglu
E Krissinel
EF Pettersen
F Glaser
FB Sheinerman
G Kuzu
G Pintilie
HM Berman
HR Saibil
I Farabella
IMA Nooren
J Zhang
JM de la Rosa-Trevín
LL Conte
M Gao
M Guharoy
MC Lawrence
MD Winn
MG Prisant
P Chakrabarti
P Emsley
R Chen
R Dintyala
R Norel
RC Edgar
RC Edgar
S Jones
S Malhotra
S Malhotra
S Viswanath
S Xia
SF Altschul
T Burnley
T Pupko
VB Chen
WSJ Valdar
X Bai
Y Ofran
Y Tsuchiya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Structures of macromolecular assemblies derived from cryo-EM maps often contain errors that become more abundant with decreasing resolution. Despite efforts in the cryo-EM community to develop metrics for map and atomistic model validation, thus far, no specific scoring metrics have been applied systematically to assess the interface between the assembly subunits. Here, we comprehensively assessed protein–protein interfaces in macromolecular assemblies derived by cryo-EM. To this end, we developed Protein Interface-score (PI-score), a density-independent machine learning-based metric, trained using the features of protein–protein interfaces in crystal structures. We evaluated 5873 interfaces in 1053 PDB-deposited cryo-EM models (including SARS-CoV-2 complexes), as well as the models submitted to CASP13 cryo-EM targets and the EM model challenge. We further inspected the interfaces associated with low-scores and found that some of those, especially in intermediate-to-low resolution (worse than 4 Å) structures, were not captured by density-based assessment scores. A combined score incorporating PI-score and fit-to-density score showed discriminatory power, allowing our method to provide a powerful complementary assessment tool for the ever-increasing number of complexes solved by cryo-EM

Crossref

Birkbeck Institutional Research Online

ePubs: the open archive for STFC research publications

ReproPhylo:An environment for reproducible Phylogenomics

Author: A Dereeper
A Stamatakis
AF Magee
Amir Szitenberg
AR Lemmon
AY Kawahara
B Chisham
B Giardine
C Boettiger
CE Shannon
CG Begley
CW Dunn
D Blankenberg
D Penny
David H. Lunt
DE Knuth
DF Robinson
F Pérez
G Talavera
GSC Slater
J Goecks
J Huerta-Cepas
J Huerta-Cepas
J Leebens-Mack
J Sukumaran
JA Ågren
JD Hunter
JM Eales
JR Grant
K Cranston
K Katoh
KD Whitney
M McNutt
M Pagel
M Pagel
M Suyama
Mark L. Blaxter
Max John
MK Kuhner
MV Han
N Lartillot
Paul P Gardner
PG Higgs
PJA Cock
R Sánchez
RC Edgar
RC Edgar
S Capella-Gutiérrez
S Schulze-Kremer
TH Oakley
TH Struck
TH Vines
WD Pearse
WSJ Valdar
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/05/2015
Field of study

The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This 'single file' approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. This file, along with a Git repository, are the primary reproducibility outputs of the program. In addition, ReproPhylo produces an extensive human-readable report and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 Python module and is easily installed as a Docker image or a WinPython self-sufficient package, with a Jupyter Notebook GUI, or as a slimmer version in a Galaxy distribution

Repository@Hull - Worktribe

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

FigShare