Search CORE

187 research outputs found

Disorder drives cooperative folding in a multidomain protein

Author: Clarke J
Gruszka DT
Hawkhead J
Mendonça CATF
Paci E
Potts JR
Whelan F
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 03/10/2016
Field of study

Many human proteins contain intrinsically disordered regions, and disorder in these proteins can be fundamental to their function - for example, facilitating transient but specific binding, promoting allostery, or allowing efficient posttranslational modification. SasG, a multidomain protein implicated in host colonization and biofilm formation in Staphylococcus aureus, provides another example of how disorder can play an important role. Approximately one-half of the domains in the extracellular repetitive region of SasG are intrinsically unfolded in isolation, but these E domains fold in the context of their neighboring folded G5 domains. We have previously shown that the intrinsic disorder of the E domains mediates long-range cooperativity between nonneighboring G5 domains, allowing SasG to form a long, rod-like, mechanically strong structure. Here, we show that the disorder of the E domains coupled with the remarkable stability of the interdomain interface result in cooperative folding kinetics across long distances. Formation of a small structural nucleus at one end of the molecule results in rapid structure formation over a distance of 10 nm, which is likely to be important for the maintenance of the structural integrity of SasG. Moreover, if this normal folding nucleus is disrupted by mutation, the interdomain interface is sufficiently stable to drive the folding of adjacent E and G5 domains along a parallel folding pathway, thus maintaining cooperative folding

Crossref

PubMed Central

Apollo (Cambridge)

White Rose Research Online

Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

Author: A Doron-Faigenboim
A Schneider
AL Halpern
AR Kinjo
C Kosiol
Darren Martin
DT Jones
G Bazykin
GC Conant
H Akaike
I Keller
J Adachi
J Adachi
JP Huelsenbeck
K Tamura
L Jin
M Anisimova
M Averof
M Hasegawa
M Kimura
MA Larkin
MO Dayhoff
MW Dimmic
N Goldman
N Rodrigue
N Takahata
NGC Smith
R Grantham
S Guindon
S Miyazawa
S Whelan
S Whelan
S Whelan
Sanzo Miyazawa
SC Choi
SQ Le
SV Muse
T Miyata
T Miyata
TK Seo
TK Seo
W Delport
W Delport
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/03/2011
Field of study

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

Author: Ben Murrell
C Kosiol
D Posada
D Posada
D Robinson
Daniel Kaliski
DC Nickle
DD Lee
DJ Lipman
DT Jones
F Abascal
Gerdus Benade
J Adachi
J Felsenstein
J Felsenstein
Jan Buys
K Devarajan
Konrad Scheffler
KP Burnham
KP Burnham
L Stanfel
Lise du Buisson
MO Dayhoff
MO Dayhoff
MW Dimmic
N Goldman
N Lartillot
Robert Ketteringham
S Whelan
S Whelan
S Zoller
SA Guindon
Sasha Moola
SL Kosakovsky Pond
SL Kosakovsky Pond
SQ Le
SQ Le
Thomas Mailund
Thomas Weighill
Tristan Hands
W Delport
Y Cao
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models

Public Library of Science (PLOS)

Cape Town University OpenUCT

Crossref

Directory of Open Access Journals

PubMed Central

Stellenbosch University SUNScholar Repository

PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

Author: A Löytynoja
A Löytynoja
A Löytynoja
A Löytynoja
A Pang
A Rambaut
A Varadarajan
Botond Sipos
D Tian
DT Gillespie
E Paradis
Gregory E Jordan
H Bengtson
H Philippe
JL Oliver
JL Thorne
JP Huelsenbeck
KP Schliep
LJ Harmon
M Blanchette
M Kimura
MS Rosenberg
N de la Chaux
N Goldman
N Goldman
Nick Goldman
RA Cartwright
S Whelan
TG Clark
Tim Massingham
W Fletcher
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity. Results <monospace>PhyloSim</monospace> is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, <monospace>PhyloSim</monospace> can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing <monospace>PhyloSim</monospace> to be adapted to specific needs. Conclusions Close integration with <monospace>R</monospace> and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that <monospace>PhyloSim</monospace> will be useful to future studies involving simulated alignments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Advantages of a Mechanistic Codon Substitution Model for Evolutionary Analysis of Protein-Coding Sequences

Author: A Doron-Faigenboim
A Schneider
A Stuart
AL Halpern
B Shapiro
B Zhong
C Kosiol
D Posada
Darren P. Martin
DT Jones
E Zuckerkandl
G Bazykin
G Schwarz
H Akaike
H Nishihara
J Adachi
J Adachi
J Adachi
J Wakeley
JP Huelsenbeck
K Tamura
M Averof
M Go
M Hasegawa
M Ingman
M Kimura
M Nikaido
MA Larkin
MA Suchard
MO Dayhoff
MW Dimmic
N Galtier
N Goldman
P Lopez
RK Jansen
S Guindon
S Miyazawa
S Miyazawa
S Whelan
S Whelan
Sanzo Miyazawa
SQ Le
SV Muse
T Gojobori
T Miyata
TK Seo
TK Seo
V Minin
W Delport
W Delport
WM Fitch
WW Brown
Z Abdo
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 29/12/2011
Field of study

A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated.The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A model-independent approach to infer hierarchical codon substitution dynamics

Author: A Jiménez-Sánchez
AS Novozhilov
C Kosiol
C Kosiol
CR Woese
CR Woese
DG Hwang
DS Riddle
DT Jones
E Trifonov
EN Trifonov
EN Trifonov
EN Trifonov
FH Crick
GH Gonnet
HA Simon
JEM Hornos
JG Kemeny
JR Jungck
JTF Wong
M Di Giulio
M Meilă
MA Jiménez-Montano
MA Jiménez-Montaño
Martin Nilsson Jacobi
MN Jacobi
MO Dayhoff
MS Johnson
MW Nirenberg
O Görnerup
O R
Olof Görnerup
R Marquez
S Itzkovitz
S Whelan
SD Copley
T Bollenbach
T Wilhelm
TD Wu
V Karasev
VR Chechetkin
W Taylor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Codon substitution constitutes a fundamental process in molecular biology that has been studied extensively. However, prior studies rely on various assumptions, e.g. regarding the relevance of specific biochemical properties, or on conservation criteria for defining substitution groups. Ideally, one would instead like to analyze the substitution process in terms of raw dynamics, independently of underlying system specifics. In this paper we propose a method for doing this by identifying groups of codons and amino acids such that these groups imply closed dynamics. The approach relies on recently developed spectral and agglomerative techniques for identifying hierarchical organization in dynamical systems. Results We have applied the techniques on an empirically derived Markov model of the codon substitution process that is provided in the literature. Without system specific knowledge of the substitution process, the techniques manage to "blindly" identify multiple levels of dynamics; from amino acid substitutions (via the standard genetic code) to higher order dynamics on the level of amino acid groups. We hypothesize that the acquired groups reflect earlier versions of the genetic code. Conclusions The results demonstrate the applicability of the techniques. Due to their generality, we believe that they can be used to coarse grain and identify hierarchical organization in a broad range of other biological systems and processes, such as protein interaction networks, genetic regulatory networks and food webs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Chalmers Research

Chalmers Publication Library

Comparative genomics of the class 4 histone deacetylase family indicates a complex evolutionary history

Author: A Ito
A Mukai
AG Matthysse
AG Simpson
AP Wolffe
AS Novozhilov
C Hubbert
DD Leipe
DL Swofford
DP Genereux
DT Jones
E Bapteste
EV Koonin
EV Koonin
F Ronquist
H Ochman
H Philippe
H Philippe
IV Gregoretti
JD Thompson
JO Andersson
JO Andersson
JO Andersson
JP Huelsenbeck
JR Brown
K Luger
K Nakashima
K Strimmer
L Gao
LM Iyer
MA Ragan
Michel Vervoort
PA Marks
PJ Keeling
RE Steele
RG Beiko
S Guindon
S Whelan
SF Altschul
T Kouzarides
U Bergthorsson
U Bergthorsson
Valérie Ledent
WM Yang
YL Yao
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Histone deacetylases are enzymes that modify core histones and play key roles in transcriptional regulation, chromatin assembly, DNA repair, and recombination in eukaryotes. Three types of related histone deacetylases (classes 1, 2, and 4) are widely found in eukaryotes, and structurally related proteins have also been found in some prokaryotes. Here we focus on the evolutionary history of the class 4 histone deacetylase family. RESULTS: Through sequence similarity searches against sequenced genomes and expressed sequence tag data, we identified members of the class 4 histone deacetylase family in 45 eukaryotic and 37 eubacterial species representative of very distant evolutionary lineages. Multiple phylogenetic analyses indicate that the phylogeny of these proteins is, in many respects, at odds with the phylogeny of the species in which they are found. In addition, the eukaryotic members of the class 4 histone deacetylase family clearly display an anomalous phyletic distribution. CONCLUSION: The unexpected phylogenetic relationships within the class 4 histone deacetylase family and the anomalous phyletic distribution of these proteins within eukaryotes might be explained by two mechanisms: ancient gene duplication followed by differential gene losses and/or horizontal gene transfer. We discuss both possibilities in this report, and suggest that the evolutionary history of the class 4 histone deacetylase family may have been shaped by horizontal gene transfers

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DI-fusion

AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

Author: A Dereeper
A Loytynoja
A Wehe
AR Nabhan
B Rannala
BG Hall
C Chauve
C Notredame
C Zhou
Chan Zhou
CR Linder
DA Benson
DJ Zwickl
DM Hillis
DT Jones
F Jacobsen
F Plazzi
F Ronquist
F Ronquist
Fenglou Mao
J Kim
J Pecon-Slattery
JA Eisen
Jinling Huang
Johann Peter Gogarten
JP Jenuth
JP Townsend
JP Townsend
K Katoh
K Katoh
K Liu
K Liu
K Tamura
KA Cranston
KB Li
KS Pick
L Liu
L Liu
M Poptsova
MN Price
MS Poptsova
MS Rosenberg
MS Rosenberg
N Lartillot
O Gascuel
Paul Jaak Janssen
PD Faith
RC Edgar
RD Page
RI Vane-Wright
S Guindon
S Guindon
S Nelesen
S Whelan
SF Altschul
T Frickey
Y Yin
Y Yin
Yanbin Yin
Ying Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php

Crossref

Directory of Open Access Journals

PubMed Central

The University of North Carolina at Greensboro

ScholarShip

FigShare

Protein expression profiles indicative for drug resistance of non-small cell lung cancer

Author: B Kaina
C Cordon-Cardo
C Shim
CB Thompson
D Venturelli
DC Sgroi
DT Ross
F Valeriote
G Bradley
G Stammler
H Auner
J Huot
J Mattern
J Mattern
JA Hickman
K Ono
KJ Scanlon
KW Kohn
LD Teeter
M Volm
M Volm
M Volm
M Volm
M Volm
M Volm
M Volm
M Volm
M Volm
M Volm
M. Volm
N Weidner
R Koomägi
R Koomägi
R Koomägi
R Wooster
RDH Whelan
RK Burt
SK Khoo
SL Kelley
T Efferth
TR Golup
U Scherf
Publication venue: Nature Publishing Group
Publication date
Field of study

Data obtained from multiple sources indicate that no single mechanism can explain the resistance to chemotherapy exhibited by non-small cell lung carcinomas. The multi-factorial nature of drug resistance implies that the analysis of comprising expression profiles may predict drug resistance with higher accuracy than single gene or protein expression studies. Forty cellular parameters (drug resistance proteins, proliferative, apoptotic, and angiogenic factors, products of proto-oncogenes, and suppressor genes) were evaluated mainly by immunohistochemistry in specimens of primary non-small cell lung carcinoma of 94 patients and compared with the response of the tumours to doxorubicin in vitro. The protein expression profile of non-small cell lung carcinoma was determined by hierarchical cluster analysis and clustered image mapping. The cluster analysis revealed three different resistance profiles. The frequency of each profile was different (77, 14 and 9%, respectively). In the most frequent drug resistance profile, the resistance proteins P-glycoprotein/MDR1 (MDR1, ABCB1), thymidylate-synthetase, glutathione-S-transferase-π, metallothionein, O6-methylguanine-DNA-methyltransferase and major vault protein/lung resistance-related protein were up-regulated. Microvessel density, the angiogenic factor vascular endothelial growth factor and its receptor FLT1, and ECGF1 as well were down-regulated. In addition, the proliferative factors proliferating cell nuclear antigen and cyclin A were reduced compared to the sensitive non-small cell lung carcinoma. In this resistance profile, FOS was up-regulated and NM23 down-regulated. In the second profile, only three resistance proteins were increased (glutathione-S-transferase-π, O6-methylguanine-DNA-methyltransferase, major vault protein/lung resistance-related protein). The angiogenic factors were reduced. In the third profile, only five of the resistance factors were increased (MDR1, thymidylate-synthetase, glutathione-S-transferase-π, O6-methylguanine-DNA-methyltransferase, major vault protein/lung resistance-related protein)

Crossref

PubMed Central