Search CORE

4,137 research outputs found

Capturing coevolutionary signals in repeat proteins

Author: Espada Rocío
Ferreiro Diego
Mora Thierry
Parra R. Gonzalo
Walczak Aleksandra M.
Publication venue
Publication date: 25/07/2014
Field of study

The analysis of correlations of amino acid occurrences in globular proteins has led to the development of statistical tools that can identify native contacts -- portions of the chains that come to close distance in folded structural ensembles. Here we introduce a statistical coupling analysis for repeat proteins -- natural systems for which the identification of domains remains challenging. We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias reveals true co-evolutionary signals from which local native-contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. The overall procedure can be used to reconstruct the interactions at long distances, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric

arXiv.org e-Print Archive

CONICET Digital

Springer - Publisher Connector

PubMed Central

Hal-Diderot

From principal component to direct coupling analysis of coevolution in proteins: Low-eigenvalue modes are needed for structure prediction

Author: Cocco Simona
Monasson Remi
Weigt Martin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/08/2013
Field of study

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.Comment: Supporting information can be downloaded from: http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.100317

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Hal-Diderot

FigShare

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics

Author: Caporaso J Gregory
Easton Brett C
Hunter Lawrence
Huttley Gavin A
Knight Rob
Smit Sandra
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. Results Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. Conclusion The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Australian National University

Using evolutionary covariance to infer protein sequence-structure relationships

Author: Jia Kejue
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins

Digital Repository @ Iowa State University (ISU)

Identification of direct residue contacts in protein-protein interaction by message passing

Author: Altschuh
Atchley
Bent
Burger
Eddy
Fiedler
Galperin
G bel
H. Szurmant
J. A. Hoch
Kass
Kortemme
Kortemme
Laub
Lockless
M. Weigt
Mascher
Mukhopadhyay
Ninfa
Pruitt
R. A. White
Schmeisser
Szurmant
S el
T. Hwa
Thattai
Toro-Roman
Wells
White
Zapf
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 09/01/2009
Field of study

Understanding the molecular determinants of specificity in protein-protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein-protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.Comment: Supplementary information available on http://www.pnas.org/content/106/1/67.abstrac

arXiv.org e-Print Archive

Crossref

PubMed Central

Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures

Author: A Doron-Faigenboim
A Gulyás-Kovács
AA Fodor
AFY Poon
CH Yeang
D Altschuh
DD Pollock
DD Pollock
DS Marks
F Morcos
FM Richards
G Bazykin
IN Shindyalov
J Dutheil
J Dutheil
J Dutheil
J Felsenstein
J Romiguier
J Tsai
JD ÓBrien
JM Duarte
JM Skerker
JS Yang
K Lie
KT Simons
L Burger
L Burger
LC Martin
M Fares
M Go
M Punta
M Vassura
M Weigt
Marc Robinson-Rechavi
MN Price
MN Price
N Halabi
O Penn
P Bradley
P Fariselli
P Tataru
P Tufféry
PY Chou
R Grantham
R Nielsen
R Sathyapriya
S Guindon
S Maisnier-Patin
S Miyazawa
S Miyazawa
S Miyazawa
S Wu
Sanzo Miyazawa
SD Dunn
SJ Fleishman
SQ Le
SW Lockless
U Göbel
VN Minin
VN Minin
WM Fitch
WP Russ
WR Atchley
WR Taylor
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/08/2012
Field of study

Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing indirect dependences through other residues within a protein or even through other molecules. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that the strength of inferred residue pair couplings is an excellent predictor of residue-residue proximity in folded structures. Here, we report an alternative attempt of inferring co-evolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the neighbor-joining method are optimized as well as other parameters by maximizing a likelihood of the tree in a mechanistic codon substitution model. Mean changes of quantities, which are characteristic of concurrent and compensatory substitutions, accompanied by substitutions at each site in each branch of the tree are estimated with the likelihood of each substitution. Partial correlation coefficients of the characteristic changes along branches between sites are calculated and used to rank co-evolving site pairs. Accuracy of contact prediction based on the present co-evolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families taken from the Pfam release 26.0. Besides, this excellent accuracy indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution

Author: AA Fodor
BG Giraud
BT Korber
CH Yeang
CS Miller
CT Porter
D Juan
Daniel Y. Little
EF Pettersen
EN Baker
ER Tillier
F Pazos
F Pazos
G Shackelford
GB Gloor
H Berman
HJ Ahn
HM Berman
I Kass
JL King
KA Buss
KK Kim
KR Wollenberg
KY Yip
L Burger
LC Martin
Lu Chen
M Crisma
M Kimura
NJ Skelton
O Olmea
P Fariselli
R Gouveia-Oliveira
RD Finn
RD Finn
S Miyazawa
SA Travers
SD Dunn
Shin-Han Shiu
U Gobel
WM Fitch
Z Wang
ZO Wang
Publication venue: Public Library of Science
Publication date: 10/03/2009
Field of study

The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central