13 research outputs found

    PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats

    Get PDF
    Identification of dispersed or interspersed repeats, most of which are derived from transposons, retrotransposons or retrovirus-like elements, is an important step in genome annotation. Software tools that compare genomic sequences with precompiled repeat reference libraries using sensitive similarity-based methods provide reliable means of finding the positions of fragments homologous to known repeats. However, their output is often incomplete and fragmented owing to the mutations (nucleotide substitutions, deletions or insertions) that can result in considerable divergence from the reference sequence. Merging these fragments to identify the whole region that represents an ancient copy of a mobile element is challenging, particularly if the element is large and suffered multiple deletions or insertions. Here we report PLOTREP, a tool designed to post-process results obtained by sequence similarity search and merge fragments belonging to the same copy of a repeat. The software allows rapid visual inspection of the results using a dot-plot like graphical output. The web implementation of PLOTREP is available at

    T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data

    Get PDF
    Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of ‘no data’ calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data

    Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

    Get PDF
    Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome

    Combined Evidence Annotation of Transposable Elements in Genome Sequences

    Get PDF
    Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated “TE models” in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila

    Abundant Degenerate Miniature Inverted-Repeat Transposable Elements in Genomes of Epichloid Fungal Endophytes of Grasses

    Get PDF
    Miniature inverted-repeat transposable elements (MITEs) are abundant repeat elements in plant and animal genomes; however, there are few analyses of these elements in fungal genomes. Analysis of the draft genome sequence of the fungal endophyte Epichloë festucae revealed 13 MITE families that make up almost 1% of the E. festucae genome, and relics of putative autonomous parent elements were identified for three families. Sequence and DNA hybridization analyses suggest that at least some of the MITEs identified in the study were active early in the evolution of Epichloë but are not found in closely related genera. Analysis of MITE integration sites showed that these elements have a moderate integration site preference for 5′ genic regions of the E. festucae genome and are particularly enriched near genes for secondary metabolism. Copies of the EFT-3m/Toru element appear to have mediated recombination events that may have abolished synthesis of two fungal alkaloids in different epichloae. This work provides insight into the potential impact of MITEs on epichloae evolution and provides a foundation for analysis in other fungal genomes

    “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

    Get PDF
    International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes

    Tiling microarray analysis of rice chromosome 10 to identify the transcriptome and relate its expression to chromosomal architecture

    Get PDF
    BACKGROUND: Sequencing and annotation of the genome of rice (Oryza sativa) have generated gene models in numbers that top all other fully sequenced species, with many lacking recognizable sequence homology to known genes. Experimental evaluation of these gene models and identification of new models will facilitate rice genome annotation and the application of this knowledge to other more complex cereal genomes. RESULTS: We report here an analysis of the chromosome 10 transcriptome of the two major rice subspecies, japonica and indica, using oligonucleotide tiling microarrays. This analysis detected expression of approximately three-quarters of the gene models without previous experimental evidence in both subspecies. Cloning and sequence analysis of the previously unsupported models suggests that the predicted gene structure of nearly half of those models needs improvement. Coupled with comparative gene model mapping, the tiling microarray analysis identified 549 new models for the japonica chromosome, representing an 18% increase in the annotated protein-coding capacity. Furthermore, an asymmetric distribution of genome elements along the chromosome was found that coincides with the cytological definition of the heterochromatin and euchromatin domains. The heterochromatin domain appears to associate with distinct chromosome level transcriptional activities under normal and stress conditions. CONCLUSION: These results demonstrated the utility of genome tiling microarray in evaluating annotated rice gene models and in identifying novel transcriptional units. The tiling microarray sanalysis further revealed a chromosome-wide transcription pattern that suggests a role for transposable element-enriched heterochromatin in shaping global transcription in response to environmental changes in rice
    corecore