Article thumbnail
Location of Repository

Detection of transposable elements by their compositional bias

By Olivier Andrieu, Anna-Sophie Fiston, Dominique Anxolabéhère and Hadi Quesneville


BACKGROUND: Transposable elements (TE) are mobile genetic entities present in nearly all genomes. Previous work has shown that TEs tend to have a different nucleotide composition than the host genes, either considering codon usage bias or dinucleotide frequencies. We show here how these compositional differences can be used as a tool for detection and analysis of TE sequences. RESULTS: We compared the composition of TE sequences and host gene sequences using probabilistic models of nucleotide sequences. We used hidden Markov models (HMM), which take into account the base composition of the sequences (occurrences of words n nucleotides long, with n ranging here from 1 to 4) and the heterogeneity between coding and non-coding parts of sequences. We analyzed three sets of sequences containing class I TEs, class II TEs and genes respectively in three species: Drosophila melanogaster, Cænorhabditis elegans and Arabidopsis thaliana. Each of these sets had a distinct, homogeneous composition, enabling us to distinguish between the two classes of TE and the genes. However the particular base composition of the TEs differed in the three species studied. CONCLUSIONS: This approach can be used to detect and annotate TEs in genomic sequences and complements the current homology-based TE detection methods. Furthermore, the HMM method is able to identify the parts of a sequence in which the nucleotide composition resembles that of a coding region of a TE. This is useful for the detailed annotation of TE sequences, which may contain an ancient, highly diverged coding region that is no longer fully functional

Topics: Methodology Article
Publisher: BioMed Central
Year: 2004
DOI identifier: 10.1186/1471-2105-5-94
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2004). Analyses of LTR-retrotransposon structure reveal recent and rapid genomic DNA loss in rice. Genome Res
  2. (2003). Anxolabéhère D: Detection of New Transposable Elements Families in Drosophila melanogaster and Anopheles gambiæ Genomes.
  3. (2002). Biémont C: Codon usage by transposable elements and their hosts in five species.
  4. (1995). CB: DNA synthesis errors associated with double-strand-break repair. Genetics
  5. (1995). Dinucleotide relative abundance extremes: a genomic signature. Trends Genet
  6. (2001). DNA methylation learns to fly. Trends Genet
  7. (2001). FX: DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty1. Nucleic Acids Res
  8. (1998). GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res
  9. Genome Annotation Database of Drosophila
  10. (2002). Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res
  11. (2001). Hirochika H: Silencing of transposable elements in plants. Trends Plant Sci
  12. (1989). In Drosophila: a laboratory Handbook
  13. (1998). Markov chains and hidden Markov models. In Biological sequence analysis: probabilistic models of proteins and nucleic acids Cambridge
  14. (2001). Naveira H: Structural features of the mdg1 lineage of the Ty3/gypsy group of LTR retrotransposon inferred from phylogenetic analyses of its open reading frames.
  15. (1997). Prediction of complete gene structures in human genomic DNA.
  16. (2000). Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet
  17. (1989). Sharp PM: Evidence that mutation patterns vary among Drosophila transposable elements.
  18. (1989). Stochastic Models for Heterogeneous DNA Sequences.
  19. (2002). The relative abundance of dinucleotides in transposable elements in five species. Mol Biol Evol
  20. (2002). The roles of REV3 and RAD57 in double-strand-break-repair-induced mutagenesis of Saccharomyces cerevisæ. Genetics
  21. (2002). The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.