99 research outputs found

    Recognizing the pseudogenes in bacterial genomes

    Get PDF
    Pseudogenes are now known to be a regular feature of bacterial genomes and are found in particularly high numbers within the genomes of recently emerged bacterial pathogens. As most pseudogenes are recognized by sequence alignments, we use newly available genomic sequences to identify the pseudogenes in 11 genomes from 4 bacterial genera, each of which contains at least 1 human pathogen. The numbers of pseudogenes range from 27 in Staphylococcus aureus MW2 to 337 in Yersinia pestis CO92 (e.g. 1–8% of the annotated genes in the genome). Most pseudogenes are formed by small frameshifting indels, but because stop codons are A + T-rich, the two low-G + C Gram-positive taxa (Streptococcus and Staphylococcus) have relatively high fractions of pseudogenes generated by nonsense mutations when compared with more G + C-rich genomes. Over half of the pseudogenes are produced from genes whose original functions were annotated as ‘hypothetical’ or ‘unknown’; however, several broadly distributed genes involved in nucleotide processing, repair or replication have become pseudogenes in one of the sequenced Vibrio vulnificus genomes. Although many of our comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogenes are formed and eliminated relatively rapidly from most bacterial genomes

    Simulation based estimation of branching models for LTR retrotransposons

    Full text link
    Motivation: LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows to take into account both positions and degradations of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results: Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we show that an accurate estimation of the parameters of this propagation model can be performed. We applied this method to the study of the spread of the transposable elements ROO, GYPSY, and DM412 on a chromosome of \textit{Drosophila melanogaster}. Availability: Our proposal has been implemented using Python software. Source code is freely available on the web at https://github.com/SergeMOULIN/retrotransposons-spread.Comment: 7 pages, 3 figures, 7 tables. Submit to "Bioiformatics" on March 1, 201

    The source of laterally transferred genes in bacterial genomes

    Get PDF
    BACKGROUND: Laterally transferred genes have often been identified on the basis of compositional features that distinguish them from ancestral genes in the genome. These genes are usually A+T-rich, arguing either that there is a bias towards acquiring genes from donor organisms having low G+C contents or that genes acquired from organisms of similar genomic base compositions go undetected in these analyses. RESULTS: By examining the genome contents of closely related, fully sequenced bacteria, we uncovered genes confined to a single genome and examined the sequence features of these acquired genes. The analysis shows that few transfer events are overlooked by compositional analyses. Most observed lateral gene transfers do not correspond to free exchange of regular genes among bacterial genomes, but more probably represent the constituents of phages or other selfish elements. CONCLUSIONS: Although bacteria tend to acquire large amounts of DNA, the origin of these genes remains obscure. We have shown that contrary to what is often supposed, their composition cannot be explained by a previous genomic context. In contrast, these genes fit the description of recently described genes in lambdoid phages, named 'morons'. Therefore, results from genome content and compositional approaches to detect lateral transfers should not be cited as evidence for genetic exchange between distantly related bacteria

    “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

    Get PDF
    International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes

    The evolutionary dynamics of the Helena retrotransposon revealed by sequenced Drosophila genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several studies have shown that genomes contain a mixture of transposable elements, some of which are still active and others ancient relics that have degenerated. This is true for the non-LTR retrotransposon <it>Helena</it>, of which only degenerate sequences have been shown to be present in some species (<it>Drosophila melanogaster</it>), whereas putatively active sequences are present in others (<it>D. simulans</it>). Combining experimental and population analyses with the sequence analysis of the 12 <it>Drosophila </it>genomes, we have investigated the evolution of <it>Helena</it>, and propose a possible scenario for the evolution of this element.</p> <p>Results</p> <p>We show that six species of <it>Drosophila </it>have the <it>Helena </it>transposable element at different stages of its evolution. The copy number is highly variable among these species, but most of them are truncated at the 5' ends and also harbor several internal deletions and insertions suggesting that they are inactive in all species, except in <it>D. mojavensis </it>in which quantitative RT-PCR experiments have identified a putative active copy.</p> <p>Conclusion</p> <p>Our data suggest that <it>Helena </it>was present in the common ancestor of the <it>Drosophila </it>genus, which has been vertically transmitted to the derived lineages, but that it has been lost in some of them. The wide variation in copy number and sequence degeneration in the different species suggest that the evolutionary dynamics of <it>Helena </it>depends on the genomic environment of the host species.</p

    Specific Activation of an I-Like Element in Drosophila Interspecific Hybrids

    Get PDF
    International audienceThe non-long terminal repeat (LTR) retrotransposon I, which belongs to the I superfamily of non-LTR retrotransposons, is well known in Drosophila because it transposes at a high frequency in the female germline cells in I–R hybrid dysgenic crosses of Drosophila melanogaster. Here, we report the occurrence and the upregulation of an I-like element in the hybrids of two sister species belonging to the repleta group of the genus Drosophila, D. mojavensis, and D. arizonae. These two species display variable degrees of pre-and postzygotic isolation, depending on the geographic origin of the strains. We took advantage of these features to explore the transposable element (TE) dynamics in interspecific crosses. We fully characterized the copies of this TE family in the D. mojavensis genome and identified at least one complete copy. We showed that this element is transcriptionally active in the ovaries and testes of both species and in their hybrids. Moreover, we showed that this element is upregulated in hybrid males, which could be associated with the male-sterile phenotype

    Evolutionary Origins of Genomic Repertoires in Bacteria

    Get PDF
    Explaining the diversity of gene repertoires has been a major problem in modern evolutionary biology. In eukaryotes, this diversity is believed to result mainly from gene duplication and loss, but in prokaryotes, lateral gene transfer (LGT) can also contribute substantially to genome contents. To determine the histories of gene inventories, we conducted an exhaustive analysis of gene phylogenies for all gene families in a widely sampled group, the γ-Proteobacteria. We show that, although these bacterial genomes display striking differences in gene repertoires, most gene families having representatives in several species have congruent histories. Other than the few vast multigene families, gene duplication has contributed relatively little to the contents of these genomes; instead, LGT, over time, provides most of the diversity in genomic repertoires. Most such acquired genes are lost, but the majority of those that persist in genomes are transmitted strictly vertically. Although our analyses are limited to the γ-Proteobacteria, these results resolve a long-standing paradox—i.e., the ability to make robust phylogenetic inferences in light of substantial LGT

    From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the Îł-Proteobacteria

    Get PDF
    The rapid increase in published genomic sequences for bacteria presents the first opportunity to reconstruct evolutionary events on the scale of entire genomes. However, extensive lateral gene transfer (LGT) may thwart this goal by preventing the establishment of organismal relationships based on individual gene phylogenies. The group for which cases of LGT are most frequently documented and for which the greatest density of complete genome sequences is available is the Îł-Proteobacteria, an ecologically diverse and ancient group including free-living species as well as pathogens and intracellular symbionts of plants and animals. We propose an approach to multigene phylogeny using complete genomes and apply it to the case of the Îł-Proteobacteria. We first applied stringent criteria to identify a set of likely gene orthologs and then tested the compatibilities of the resulting protein alignments with several phylogenetic hypotheses. Our results demonstrate phylogenetic concordance among virtually all (203 of 205) of the selected gene families, with each of the exceptions consistent with a single LGT event. The concatenated sequences of the concordant families yield a fully resolved phylogeny. This topology also received strong support in analyses aimed at excluding effects of heterogeneity in nucleotide base composition across lineages. Our analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT. This gene set can be identified and used to yield robust hypotheses for organismal phylogenies, thus establishing a foundation for reconstructing the evolutionary transitions, such as gene transfer, that underlie diversity in genome content and organization

    A call for benchmarking transposable element annotation methods.

    Get PDF
    International audienceDNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success
    • 

    corecore