13 research outputs found

    LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs).</p> <p>Results</p> <p>We have developed a software tool <it>LTRharvest </it>for the <it>de novo </it>detection of full length LTR retrotransposons in large sequence sets. <it>LTRharvest </it>efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of <it>LTRharvest </it>against a gold standard annotation for <it>Saccharomyces cerevisae </it>and <it>Drosophila melanogaster </it>shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of <it>LTRharvest </it>over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software.</p> <p>Conclusion</p> <p><it>LTRharvest </it>is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes <it>LTRharvest </it>a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.</p

    Fine-grained annotation and classification of de novo predicted LTR retrotransposons

    Get PDF
    Long terminal repeat (LTR) retrotransposons and endogenous retroviruses (ERVs) are transposable elements in eukaryotic genomes well suited for computational identification. De novo identification tools determine the position of potential LTR retrotransposon or ERV insertions in genomic sequences. For further analysis, it is desirable to obtain an annotation of the internal structure of such candidates. This article presents LTRdigest, a novel software tool for automated annotation of internal features of putative LTR retrotransposons. It uses local alignment and hidden Markov model-based algorithms to detect retrotransposon-associated protein domains as well as primer binding sites and polypurine tracts. As an example, we used LTRdigest results to identify 88 (near) full-length ERVs in the chromosome 4 sequence of Mus musculus, separating them from truncated insertions and other repeats. Furthermore, we propose a work flow for the use of LTRdigest in de novo LTR retrotransposon classification and perform an exemplary de novo analysis on the Drosophila melanogaster genome as a proof of concept. Using a new method solely based on the annotations generated by LTRdigest, 518 potential LTR retrotransposons were automatically assigned to 62 candidate groups. Representative sequences from 41 of these 62 groups were matched to reference sequences with >80% global sequence similarity

    The Abundant Polyadenylated Transcript 2 DNA Sequence of the Pathogenic Protozoan Parasite Entamoeba histolytica Represents a Nonautonomous Non-Long-Terminal-Repeat Retrotransposon- Like Element Which Is Absent in the Closely Related Nonpathogenic Species Entamoeba dispar

    No full text
    While comparing gene expression in the pathogenic organism Entamoeba histolytica and the closely related but nonpathogenic species Entamoeba dispar, we discovered that the E. histolytica abundant polyadenylated transcript 2 (ehapt2) and corresponding genomic copies are absent in E. dispar. Although polyadenylated, ehapt2 does not contain any overt open reading frame. Southern blot and sequence analyses revealed that about 500 copies of ehapt2 genomic elements were present in each cell and that the copies were distributed throughout the ameba genome. The various ehapt2 elements are regularly located in the vicinity of protein-encoding genes, downstream of pyrimidine-rich sequence stretches (40 to 125 bp; CT content, 79.2 to 85.5%), and are flanked by duplicated target sites of variable length. Target site duplications were obviously generated during integration of ehapt2 into the E. histolytica genome as one copy of the flanking repeat and the complete ehapt2 element are specifically absent in orthologous E. dispar genomic sequences. ehapt2 shares 3′ sequences with EhRLE, a recently identified non-long-terminal-repeat (non-LTR) retrotransposon-like element of E. histolytica, which contains a conceptual open reading frame for reverse transcriptase. Thus, ehapt2 has all of the properties of nonautonomous non-LTR retrotransposons. A comparison of various E. histolytica isolates suggested that transposition of ehapt2 takes place at a very low frequency as the genomic localization of ehapt2 elements was found to be well conserved. A mobile element such as ehapt2 could be a suitable mechanism to explain the infrequent and late transition of E. histolytica from a harmless gut commensal to an invasive pathogen

    , an efficient and flexible software for detection of LTR retrotransposons-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from ", an efficient and flexible software for detection of LTR retrotransposons"</p><p>http://www.biomedcentral.com/1471-2105/9/18</p><p>BMC Bioinformatics 2008;9():18-18.</p><p>Published online 14 Jan 2008</p><p>PMCID:PMC2253517.</p><p></p>ding site, PPT = poly purine tract, gag, pol, env = open reading frames for LTR retrotransposon genes

    CLONACIÓN Y ANÁLISIS DE LA ESTRUCTURA PRIMARIA DE QUITINA SINTASAS DE ENTAMOEBA HISTOLYTICA

    No full text
    Entamoeba histolytica, parásito responsable de la amibiasis, presenta dos etapas en su ciclo de vida: trofozoíto y quiste. Los quistes (forma infectiva) poseen una pared compuesta principalmente por quitina, polímero de ß-(1?4)-N-acetil-Dglucosamina cuya síntesis es catalizada por quitina sintasas (CHS). Las CHS se han descrito en hongos, insectos y nemátodos, pero no en protozoarios como Entamoeba. Se clonaron y secuenciaron dos genes CHS de E. histolytica. Se determinó que ambas EhCHS contienen segmentos transmembranales en sus extremos, y que la mayor similitud está restringida a la «región catalítica»; los aminoácidos importantes para la actividad de CHS están completamente conservados en ambas EhCHS Entamoeba histolytica, causal agent of amebiasis, presents two stages in his life cycle: trophozoite and cyst. The cysts (infective form) have a wall whose main component is chitin, a polymer of ß-1-4 linked Nacetyl- D-glucosamine, whose synthesis is catalyzed by chitin synthases (CHS). CHS have been described for fungi, insects, and nematodes, but not in protozoa such as Entamoeba. Two E. histolytic CHS-genes were cloned and sequenced. It was determined that both EhCHS have transmembrane helices in their N- and C-terminal end, and that the major similarity is limited to the “catalytic domain”. The amino acid residues important for the CHS activity are conserved in both EhCHS

    , an efficient and flexible software for detection of LTR retrotransposons-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from ", an efficient and flexible software for detection of LTR retrotransposons"</p><p>http://www.biomedcentral.com/1471-2105/9/18</p><p>BMC Bioinformatics 2008;9():18-18.</p><p>Published online 14 Jan 2008</p><p>PMCID:PMC2253517.</p><p></p>n this flowchart
    corecore