Search CORE

NUI Maynooth Eprint Archive

The University of Manchester - Institutional Repository

Enlighten

Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

Author: A Löytynoja
A Marchler-Bauer
Andrew D. Fernandes
BP Kleinstiver
C Floudas
C Kim
C Yanofsky
CM Buslje
CW Hogue
Darren P. Martin
DD Pollock
DY Little
ERM Tillier
F Pazos
G Shackelford
GB Gloor
GB Gloor
Gregory B. Gloor
JD Thompson
KM Wong
KR Wollenberg
KY Yip
LC Martin
Lindi M. Wahl
M Socolich
MA Fares
O Gotoh
R Kolodny
R Oliveira
RC Edgar
Russell J. Dickson
S Dunn
SAA Travers
SW Lockless
WM Fitch
WR Atchley
Publication venue: Public Library of Science
Publication date: 28/06/2010
Field of study

BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation

Public Library of Science (PLOS)

Scholarship@Western

Conserved and variable correlated mutations in the plant MADS protein network

Author: A Bairoch
A Becker
A Fuchs
A Lupas
A Sali
AA Fodor
Aalt DJ van Dijk
AD Han
ADJ van Dijk
AH Paterson
AK Ramani
AS Veron
AT Brunger
BA Krizek
C Espinosa-soto
CM Buslje
CS Goh
CS Miller
D Altschuh
D Juan
DA Afonnikov
DS Horner
E Santelli
EA Merritt
F Fornara
F Pazos
F Pazos
F Pazos
G Angenent
GA Tuskan
H Ashkenazy
HB Fraser
HY Shan
HY Shan
HY Yu
I Halperin
J Lim
J Sundstrom
JD Thompson
JG Caporaso
JL Riechmann
JMG Izarzugaza
K Hill
K Huang
K Kaufmann
K Kaufmann
L Hakes
L Mendoza
L Parenicova
L Pellegrini
LC Martin
LJ Cseke
LP Martinez-Castilla
M Hassler
M Ng
M Socolich
MA Fares
MJ Buck
N Shitsukawa
NA Kane
NJ Mulder
O Noivirt
PJ Kraulis
PJ Waddell
R Melzer
R Ming
R Velasco
RC Edgar
RGH Immink
RKP Kuipers
RM Clark
Roeland CHJ van Ham
S Ciannamea
S De Bodt
S de Folter
S Henikoff
S Mika
SA Goff
SA Rensing
SAA Travers
SAA Travers
SR Eddy
T Hernandez-Hernandez
T Sato
Y Mo
YZ Yang
YZ Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. Results Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. Conclusion Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.</p

Springer - Publisher Connector

Wageningen University & Research Publications

Non-random pre-transcriptional evolution in HIV-1. A refutation of the foundational conditions for neutral evolution

Author: Bernardi G
Carlos Y Valenzuela
Crow JF
Drake JW
Drake JW
Drake JW
Feller W
Feller W
Freund JE
Gatlin LL
Gouet R
Gouet R
Hey J
Jern P
Jukes TH
Jukes TH
Karlin S
Kimura M
Kimura M
Kimura M
Kimura M
Kimura M
Kimura M
King JL
Kitrinos KM
Kreitman M
Kreitman M
Leigh-Brown AJ
Li WH
MacNeil A
Mani I
Mrazek J
Nei M
Ohta T
Reiher III WE
Serres PF
Spiegel MR
Sueoka N
Travers SAA
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Valenzuela CY
Wright S
Yang Z
Zhang J
Publication venue: Sociedade Brasileira de Genética
Publication date: 01/01/2009
Field of study

The complete base sequence of HIV-1 virus and GP120 ENV gene were analyzed to establish their distance to the expected neutral random sequence. An especial methodology was devised to achieve this aim. Analyses included: a) proportion of dinucleotides (signatures); b) homogeneity in the distribution of dinucleotides and bases (isochores) by dividing both segments in ten and three sub-segments, respectively; c) probability of runs of bases and No-bases according to the Bose-Einstein distribution. The analyses showed a huge deviation from the random distribution expected from neutral evolution and neutral-neighbor influence of nucleotide sites. The most significant result is the tremendous lack of CG dinucleotides (p < 10-50 ), a selective trait of eukaryote and not of single stranded RNA virus genomes. Results not only refute neutral evolution and neutral neighbor influence, but also strongly indicate that any base at any nucleotide site correlates with all the viral genome or sub-segments. These results suggest that evolution of HIV-1 is pan-selective rather than neutral or nearly neutral

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Académico de la Universidad de Chile

Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

Author: A Bairoch
A Fuchs
A Hamosh
A Lapedes
A Lupi
A Tanoue
A Tanoue
AA Fodor
Andreas Kowarsch
Angelika Fuchs
BC Lee
C von Mering
D Altschuh
D Altschuh
D Vitkup
DD Pollock
DD Pollock
Dmitrij Frishman
EE Winter
F Endo
F Pazos
GB Gloor
H Huang
HM Berman
I Feldman
I Kass
IN Shindyalov
JG Caporaso
LC Martin
M Krzywinski
M Socolich
MH Knaggs
MS Singer
N Lopez-Bigas
NGC Smith
O Noivirt
O Noivirt-Brik
O Olmea
O Olmea
P Fariselli
P Ledoux
P Tuffery
P Wong
PC Ng
PC Ng
PD Stenson
Philipp Pagel
PJ Kundrotas
RC Edgar
RE Steward
RR Gutell
S Henikoff
S Sunyaev
S Vicatos
SAA Travers
SD Dunn
SK Ng
SM Larson
T Hershkovitz
Thomas Lengauer
U Göbel
V Ramensky
W Kabsch
WP Russ
WR Taylor
ZO Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/

Public Library of Science (PLOS)

PuSH

Characterization of the avian trojan gene family reveals contrasting evolutionary constraints

Author: A Fornůsková
A Sigalov
AL Hughes
AL Hughes
AL Hughes
AM Waterhouse
B Mészáros
C Burge
CJ Brown
CS Bond
CT Amemiya
D Zelus
David W Burt
DC Shields
E Birney
F Prugnolle
F Sievers
FM Jiggins
G Gremme
GB Gloor
H Dinkel
HJ Dyson
Jacqueline Smith
JJ Vamathevan
JS Bezbradica
K Okonechnikov
KJ Kunstman
LC Filip
M Gouy
M Ortiz
M Suyama
MA Fares
Maria Weronika Gutowska
Michael Schubert
MJ O’Connell
Olli Vainio
P Petrov
P Shannon
Petar Petrov
R Medzhitov
R Medzhitov
Riikka Syrjänen
S Guindon
S Sawyer
SAA Travers
T Tanaka
Tatsuya Uchida
X Wang
Y Huang
Y Minezaki
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 24/03/2015
Field of study

"Trojan" is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges.We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules