Search CORE

Springer - Publisher Connector

Springer

Fraunhofer-ePrints

Linear predictive coding representation of correlated mutation for protein sequence alignment

Author: A Elofsson
AG Murzin
AS Yang
BC Lee
Chan-seok Jeong
CM Buslje
D Cozzetto
Dongsup Kim
DT Jones
E Neher
ER Tillier
G Shackelford
GJ Bartlett
GM Süel
J Kleinjung
J Kopp
J Söding
JM Chandonia
JP Dekker
LR Rabiner
M Lee
N Siew
O Olmea
S Wu
SD Dunn
SF Altschul
SW Lockless
T Ohlson
T Pham
U Göbel
WR Atchley
Y Qi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

Public Library of Science (PLOS)

Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification

Author: A Rausell
B Sterner
Burkhard Rost
CA Innis
CE Shannon
CM Buslje
Cristina Marino Buslje
CT Porter
D Kristensen
D Leys
E Cilia
Elin Teppa
GB Gloor
GJ Bartlett
I Mihalek
J Bernardes
J Manning
J Swets
JE Donald
José María Delfino
L Byung-Chul
M Nielsen
Morten Nielsen
N Petrova
O Lichtarge
R Alterovitz
R Gouveia-Oliveira
R Matthew Ward
RD Finn
RK Kuipers
S Chakrabarti
S Erdin
S Sankararaman
S Sankararaman
SD Dunn
SF Altschul
SW Lockless
T Zhang
T-Y Chien
TM Cover
Tomas Di Doménico
W Tong
Y-R Tang
Z Shi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution

CiteSeerX

Online Research Database In Technology

Structural and Functional Roles of Coevolved Sites in Proteins

Author: A Marchler-Bauer
A Valencia
Anna R. Panchenko
AS Kondrashov
B Bulka
BC Lee
BT Korber
C Ferrer-Costa
CA Voigt
CH Yeang
CM Buslje
CS Goh
D Altschuh
DB Johnson
DD Pollock
DJ Watts
DY Little
ER Tillier
G Chelvanayagam
GB Gloor
GL Moore
IN Shindyalov
K Fukami-Kobayashi
K Henrick
K Mizuguchi
KR Wollenberg
L Pritchard
LA Amaral
LC Martin
M Kimura
M Vendruscolo
MC Saraf
MD Daily
MG Kann
N Mathias
Narcis Fernandez-Fuentes
O Olmea
P Shannon
R Gouveia-Oliveira
S Chakrabarti
S Chakrabarti
S Govindarajan
Saikat Chakrabarti
SD Dunn
SN Fatakia
SS Choi
TA Castoe
U Gobel
WL DeLano
WM Fitch
WR Atchley
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification.In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution.Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites

Public Library of Science (PLOS)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

How is structural divergence related to evolutionary information?

Author: Marino Cristina Ester
Monzón Alexander
Parisi Gustavo Daniel
Zea Diego Javier
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

The analysis of evolutionary information in a protein family, such as conservation and covariation, is often linked to its structural information. Multiple sequence alignments of distant homologous sequences are used to measure evolutionary variables. Although high structural differences between proteins can be expected in such divergent alignments, most works linking evolutionary and structural information use a single structure ignoring the structural variability within protein families. The goal of this work is to elucidate the relevance of structural divergence when sequence-based measures are integrated with structural information. We found that inter-residue contacts and solvent accessibility undergo large variations in protein families. Our results show that high covariation scores tend to reveal residue contacts that are conserved in the family, instead of protein or conformer specific contacts. We also found that residue accessible surface area shows a high variability between structures of the same family. As a consequence, the mean relative solvent accessibility of multiple structures correlates better with the conservation pattern than the relative solvent accessibility of a single structure. We conclude that the use of comprehensive structural information allows a more accurate interpretation of the information computed from sequence alignments. Therefore, considering structural divergence would lead to a better understanding of protein function, dynamics, and evolution.Fil: Zea, Diego Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Monzón, Alexander. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Parisi, Gustavo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Marino, Cristina Ester. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentin

CONICET Digital

Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

Author: A Löytynoja
A Marchler-Bauer
Andrew D. Fernandes
BP Kleinstiver
C Floudas
C Kim
C Yanofsky
CM Buslje
CW Hogue
Darren P. Martin
DD Pollock
DY Little
ERM Tillier
F Pazos
G Shackelford
GB Gloor
GB Gloor
Gregory B. Gloor
JD Thompson
KM Wong
KR Wollenberg
KY Yip
LC Martin
Lindi M. Wahl
M Socolich
MA Fares
O Gotoh
R Kolodny
R Oliveira
RC Edgar
Russell J. Dickson
S Dunn
SAA Travers
SW Lockless
WM Fitch
WR Atchley
Publication venue: Public Library of Science
Publication date: 28/06/2010
Field of study

BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation

Public Library of Science (PLOS)

Scholarship@Western

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Author: Berger Leighton Bonnie
Liu Yang
Palmedo Peter Franklin
Peng Jian
Ye Qing
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction. Many protein structures of interest remain out of reach for both computational prediction and experimental determination. DeepContact learns patterns of co-evolution across thousands of experimentally determined structures, identifying conserved local motifs and leveraging this information to improve protein residue-residue contact predictions. DeepContact extracts additional information from the evolutionary couplings using its knowledge of co-evolution and structural space, while also converting coupling scores into probabilities that are comparable across protein sequences and alignments. Keywords: contact prediction; convolutional neural networks; deep learning; protein structure prediction; structure prediction; co-evolution; evolutionary couplingsNational Institutes of Health (U.S.) (Grant R01GM081871

DSpace@MIT

Amino acid positions subject to multiple co-evolutionary constraints can be robustly identified by their eigenvector network centrality scores

Author: Altschul
Arakaki
Armon
Ashkenazy
Bell
Benítez-Páez
Bonacich
Breen
Brown
Buck
Burger
Buslje
Capra
Chakrabarti
Chakrabarti
Chi
Choi
Dawid
Dekker
Dellus-Gur
Dunn
Edgar
Edgar
Falcon
Fatakia
Fichtenberg
Flynn
Fodor
Fodor
Fowler
Fowler
Gloor
Gloor
Gobel
Gu
Gundlapalli
Halabi
Hars
Horner
Jordan
Kalinina
Kann
Kass
Kleina
Kleinberg
Kryazhimskiy
La
Landherr
Lebherz
Lee
Lee
Lichtarge
Livesay
Lockless
Lohmann
Markiewicz
Marks
Meinhardt
Mihalek
Needleman
Newman
Ng
Olmea
Olmea
Ozarowski
Parente
Pei
Pei
Pelé
Pettersen
Ramensky
Sato
Schumacher
Schumacher
Schumacher
Schumacher
Shaw
Simonetti
Stamatakis
Suckow
Swint-Kruse
Talavera
Teşileanu
Tungtur
Tungtur
Valdar
Xu
Xu
Ye
Zhan
Zhan
Publication venue: 'Wiley'
Publication date: 01/12/2015
Field of study

As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for co-evolution between pairs of positions. Co-evolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of co-evolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded co-evolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; “central” positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise co-evolution scores: Instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints – detectable by divergent algorithms – that occur at key protein locations. Finally, we discuss the fact that multiple patterns co-exist in evolutionary data that, together, give rise to emergent protein functions

KU ScholarWorks

Springer - Publisher Connector

Correlated mutations via regularized multinomial regression

Author: Sreekumar Janardanan
ter Braak Cajo JF
van Dijk Aalt DJ
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments