Search CORE

Infoscience - École polytechnique fédérale de Lausanne

Probing the Informational and Regulatory Plasticity of a Transcription Factor DNA–Binding Domain

Transcription factors have two functional constraints on their evolution: (1) their binding sites must have enough information to be distinguishable from all other sequences in the genome, and (2) they must bind these sites with an affinity that appropriately modulates the rate of transcription. Since both are determined by the biophysical properties of the DNA–binding domain, selection on one will ultimately affect the other. We were interested in understanding how plastic the informational and regulatory properties of a transcription factor are and how transcription factors evolve to balance these constraints. To study this, we developed an in vivo selection system in Escherichia coli to identify variants of the helix-turn-helix transcription factor MarA that bind different sets of binding sites with varying degrees of degeneracy. Unlike previous in vitro methods used to identify novel DNA binders and to probe the plasticity of the binding domain, our selections were done within the context of the initiation complex, selecting for both specific binding within the genome and for a physiologically significant strength of interaction to maintain function of the factor. Using MITOMI, quantitative PCR, and a binding site fitness assay, we characterized the binding, function, and fitness of some of these variants. We observed that a large range of binding preferences, information contents, and activities could be accessed with a few mutations, suggesting that transcriptional regulatory networks are highly adaptable and expandable

CiteSeerX

FigShare

A reexamination of information theory-based methods for DNA-binding site identification

Author: A Kolb
AR Fernandez De Henestrosa
B Barash
CE Lawrence
CE Shannon
D Betel
D GuhaThakurta
DT Pride
EN Trifonov
ET Jaynes
ET Jaynes
G Robertson
G Thijs
GD Stormo
GD Stormo
GD Stormo
GE Crooks
GJ Phillips
GZ Hertz
I Erill
Ivan Erill
J Rudnick
J van Helden
JJ Kohler
JM Heumann
JT Kim
JW Gibbs
K Gaston
K Uchida
KL Griffith
L Kozobay-Avraham
LJ Sun
LL Gatlin
LL Gatlin
M Abella
M Asayama
M Butala
M Schnarr
MC O'Neill
MC O'Neill
MC O'Neill
MH Zweig
Michael C O'Neill
ML Bulyk
MS Gelfand
N Baichoo
O Aparicio
O Huisman
OG Berg
OG Berg
P D'Haeseleer
PH von Hippel
PH von Hippel
R Brent
R Jauregui
R Munch
R Munch
R Osada
R Staden
RJ Redfield
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
RV Parbhane
S Krishna
S Kullback
ST Cole
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TD Schneider
TL Bailey
TL Bailey
X Liu
Z Chen
Z Xiaoyue
Publication venue: BioMed Central
Publication date: 01/02/2009
Field of study

Abstract Background Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods. Results Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results. Conclusion We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

Springer - Publisher Connector

Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes

Author: A Bolotin
A Henne
A Sola-Landa
AC Kaberdina
AL Delcher
B Chang
BE Moseley
CJ Wu
D Benelli
DE Andreev
E Torarinsson
FD Ciccarelli
Gang-Qing Hu
GE Crooks
GP van Wezel
GQ Hu
GQ Hu
GR Janssen
H Chen
H Nothaft
HJ Hong
HQ Zhu
HQ Zhu
Huaiqiu Zhu
I Moll
J Besemer
J Ma
J Shine
JA Lake
JS Hahn
K Chin
M Brenneis
M Jiang
M Kozak
M Ptashne
M Ventura
MA Larkin
MM Slupska
MN Price
MS Paget
N Tolstrup
NJ Ryding
O Hering
P Dam
P Londei
PA Hoskisson
R Hershberg
RK Shultzaberger
RL Tatusov
S Grill
S Kumar
S Nakagawa
T Sazuka
T Udagawa
T Umeyama
TB Anderson
V Mazurakova
WP Revill
Xiaobin Zheng
Zhen-Su She
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale <it>in silico </it>analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes. Results Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes. Conclusions Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for <it>Actinobacteria </it>and <it>Deinococcus-Thermus</it>, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Design Parameters to Control Synthetic Gene Expression in Escherichia coli

Author: A Bjornsson
A Eyre-Walker
A Fuglsang
A Henaut
A Villalobos
Alan Villalobos
Austin Gurney
BJ Del Tito Jr.
C Gustafsson
Claes Gustafsson
CM Stenstrom
CM Stenstrom
CM Stenstrom
DH Mathews
E Gonzalez de Valdivia
EI Gonzalez de Valdivia
G Kudla
G Wu
G Wu
GA Gutman
Grzegorz Kudla
GT Chen
H Dong
H Dong
I Iost
J Bonomo
J Elf
J Liao
J Newcomb
JC Venter
Jeremy Minshull
JF Kane
JH Holland
Jon E. Ness
K Itakura
KA Dittmar
L Blanco
L Eriksson
M Graf
M Welch
MA Sørensen
Mark Welch
MV Rojiani
NA Burgess-Brown
P Rice
PJ Dillon
PM Sharp
R Reynolds
RK Shultzaberger
S Boycheva
S Wold
Sridhar Govindarajan
SW Harcum
VR Kaberdin
VR Kaberdin
Y Sohn
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

BACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. PRINCIPAL FINDINGS:To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. CONCLUSION:The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system

University of Miami: Scholarship Miami

Recommended from our members

Anatomy of Escherichia coli ribosome binding sites

Author: Bucheimer R E
Rudd K E
Schneider T D
Shultzaberger R K
Publication venue: England
Publication date: 12/10/2001
Field of study

During translational initiation in prokaryotes, the 3' end of the 16S rRNA binds to a region just upstream of the initiation codon. The relationship between this Shine-Dalgarno (SD) region and the binding of ribosomes to translation start-points has been well studied, but a unified mathematical connection between the SD, the initiation codon and the spacing between them has been lacking. Using information theory, we constructed a model that treats these three components uniformly by assigning to the SD and the initiation region (IR) conservations in bits of information, and by assigning to the spacing an uncertainty, also in bits. To build the model, we first aligned the SD region by maximizing the information content there. The ease of this process confirmed the existence of the SD pattern within a set of 4122 reviewed and revised Escherichia coli gene starts. This large data set allowed us to show graphically, by sequence logos, that the spacing between the SD and the initiation region affects both the SD site conservation and its pattern. We used the aligned SD, the spacing, and the initiation region to model ribosome binding and to identify gene starts that do not conform to the ribosome binding site model. A total of 569 experimentally proven starts are more conserved (have higher information content) than the full set of revised starts, which probably reflects an experimental bias against the detection of gene products that have inefficient ribosome binding sites. Models were refined cyclically by removing non-conforming weak sites. After this procedure, models derived from either the original or the revised gene start annotation were similar. Therefore, this information theory-based technique provides a method for easily constructing biologically sensible ribosome binding site models. Such models should be useful for refining gene-start predictions of any sequenced bacterial genome

The Fitness Landscapes of cis-Acting Binding Sites in Different Promoter and Environmental Contexts

Author: A Hochschild
AJ Dombroski
B Sclavi
C Fry
Daniel S. Malashock
David S. Guttman
DF Browning
DK Hawley
E Dekel
J Gertz
J Kim
J Sambrook
Jack F. Kirsch
L Bintu
L Zheng
Michael B. Eisen
MS Fenton
PH von Hippel
R Lenski
R Martin
R Martin
RG Martin
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
RK Shultzaberger
Ryan K. Shultzaberger
S Roy
S Sheridan
SJ Maerkl
T Ellinger
T Nguyen
TD Schneider
TD Schneider
TD Schneider
TD Schneider
V Mustonen
W Mandecki
WR McClure
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study