8,726 research outputs found
Formation of regulatory modules by local sequence duplication
Turnover of regulatory sequence and function is an important part of
molecular evolution. But what are the modes of sequence evolution leading to
rapid formation and loss of regulatory sites? Here, we show that a large
fraction of neighboring transcription factor binding sites in the fly genome
have formed from a common sequence origin by local duplications. This mode of
evolution is found to produce regulatory information: duplications can seed new
sites in the neighborhood of existing sites. Duplicate seeds evolve
subsequently by point mutations, often towards binding a different factor than
their ancestral neighbor sites. These results are based on a statistical
analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome,
and a comparison set of intergenic regulatory sequence in Saccharomyces
cerevisiae. In fly regulatory modules, pairs of binding sites show
significantly enhanced sequence similarity up to distances of about 50 bp. We
analyze these data in terms of an evolutionary model with two distinct modes of
site formation: (i) evolution from independent sequence origin and (ii)
divergent evolution following duplication of a common ancestor sequence. Our
results suggest that pervasive formation of binding sites by local sequence
duplications distinguishes the complex regulatory architecture of higher
eukaryotes from the simpler architecture of unicellular organisms
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
Inferring processes underlying B-cell repertoire diversity
We quantify the VDJ recombination and somatic hypermutation processes in
human B-cells using probabilistic inference methods on high-throughput DNA
sequence repertoires of human B-cell receptor heavy chains. Our analysis
captures the statistical properties of the naive repertoire, first after its
initial generation via VDJ recombination and then after selection for
functionality. We also infer statistical properties of the somatic
hypermutation machinery (exclusive of subsequent effects of selection). Our
main results are the following: the B-cell repertoire is substantially more
diverse than T-cell repertoires, due to longer junctional insertions; sequences
that pass initial selection are distinguished by having a higher probability of
being generated in a VDJ recombination event; somatic hypermutations have a
non-uniform distribution along the V gene that is well explained by an
independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde
Temporal variability of diazotroph community composition in the upwelling region off NW Iberia.
Knowledge of the ecology of N2-fixing (diazotrophic) plankton is mainly limited to oligotrophic (sub)tropical oceans. However, diazotrophs are widely distributed and active throughout the global ocean. Likewise, relatively little is known about the temporal dynamics of diazotrophs in productive areas. Between February 2014 and December 2015, we carried out 9 one-day samplings in the temperate northwestern Iberian upwelling system to investigate the temporal and vertical variability of the diazotrophic community and its relationship with hydrodynamic forcing. In downwelling conditions, characterized by deeper mixed layers and a homogeneous water column, non-cyanobacterial diazotrophs belonging mainly to nifH clusters 1G (Gammaproteobacteria) and 3 (putative anaerobes) dominated the diazotrophic community. In upwelling and relaxation conditions, affected by enhanced vertical stratification and hydrographic variability, the community was more heterogeneous vertically but less diverse, with prevalence of UCYN-A (unicellular cyanobacteria, subcluster 1B) and non-cyanobacterial diazotrophs from clusters 1G and 3. Oligotyping analysis of UCYN-A phylotype showed that UCYN-A2 sublineage was the most abundant (74%), followed by UCYN-A1 (23%) and UCYN-A4 (2%). UCYN-A1 oligotypes exhibited relatively low frequencies during the three hydrographic conditions, whereas UCYN-A2 showed higher abundances during upwelling and relaxation. Our findings show the presence of a diverse and temporally variable diazotrophic community driven by hydrodynamic forcing in an upwelling system
- …