8,726 research outputs found

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    Inferring processes underlying B-cell repertoire diversity

    Full text link
    We quantify the VDJ recombination and somatic hypermutation processes in human B-cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, due to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde

    Temporal variability of diazotroph community composition in the upwelling region off NW Iberia.

    Get PDF
    Knowledge of the ecology of N2-fixing (diazotrophic) plankton is mainly limited to oligotrophic (sub)tropical oceans. However, diazotrophs are widely distributed and active throughout the global ocean. Likewise, relatively little is known about the temporal dynamics of diazotrophs in productive areas. Between February 2014 and December 2015, we carried out 9 one-day samplings in the temperate northwestern Iberian upwelling system to investigate the temporal and vertical variability of the diazotrophic community and its relationship with hydrodynamic forcing. In downwelling conditions, characterized by deeper mixed layers and a homogeneous water column, non-cyanobacterial diazotrophs belonging mainly to nifH clusters 1G (Gammaproteobacteria) and 3 (putative anaerobes) dominated the diazotrophic community. In upwelling and relaxation conditions, affected by enhanced vertical stratification and hydrographic variability, the community was more heterogeneous vertically but less diverse, with prevalence of UCYN-A (unicellular cyanobacteria, subcluster 1B) and non-cyanobacterial diazotrophs from clusters 1G and 3. Oligotyping analysis of UCYN-A phylotype showed that UCYN-A2 sublineage was the most abundant (74%), followed by UCYN-A1 (23%) and UCYN-A4 (2%). UCYN-A1 oligotypes exhibited relatively low frequencies during the three hydrographic conditions, whereas UCYN-A2 showed higher abundances during upwelling and relaxation. Our findings show the presence of a diverse and temporally variable diazotrophic community driven by hydrodynamic forcing in an upwelling system
    • …
    corecore