24 research outputs found

    Nature of the annotations from reiteration.

    No full text
    <p>Dinucleotide frequencies in the sets of CDS, and TEdenovo annotations, as well as in the annotations detected specifically by the reiterative approach as compared to the previous round (<i>e.g</i> delta_2vs1 comprises the difference between TEdnovo_2 and TEdenovo_1).</p

    Relaxed parameters, ample effects.

    No full text
    <p>(A) Distribution in 500 bp bins of the size of the consensus sequences obtained using different parameters with TEdenovo. (B) Coverage of different indicators of sensitivity by annotations obtained using the relaxed approaches. (C) Distribution in 500 bp bins of the size of the annotations obtained using consensus sequences from the relaxed approaches. (D) Distribution in 1% bins of the identity values between genomic copies and consensus sequences from the relaxed approach.</p

    Reiterations beat dark matter back.

    No full text
    <p>Coverage of genome and different indicators of sensitivity by annotations from the reiterative approach using the reference sequences (A) and the consensus sequences from TEdenovo (B) as initial library.</p

    A pipe-line for the detection of piRNA clusters, using S-MART.

    No full text
    <p>A pipe-line for the detection of piRNA clusters, using S-MART.</p

    Overall statistics of match sizes using consensus sequences from different programs for genome annotation.

    No full text
    <p>Overall statistics of match sizes using consensus sequences from different programs for genome annotation.</p

    Landscape of identity values.

    No full text
    <p>Plot and smoothed curve (100 neighbors) of the identities between genomic copies and consensus sequences from TEdenovo (A), TEdenovo_cool (B), and TEdenovo_soft (C) along <i>A. thaliana</i> chromosome 1. Because identity values are not equally spaced, the smoothing is approximate and is not strictly “Savistsky-Golay” smoothing.</p

    Deep Investigation of <i>Arabidopsis thaliana</i> Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter

    No full text
    <div><p>Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing <i>A. thaliana</i> genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the <i>A. thaliana</i> dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the <i>A. thaliana</i> repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant.</p></div

    Benefits of the combined approach.

    No full text
    <p>The coverage of the genome (A), reference set (B), Tallymer set (C), and 24-nt sRNA map (D) by annotation sets from different programs, the non-redundant combination of annotations from all tools (All), and the non-redundant combination of annotations from TEdenovo, RepeatScout, and RepeatModeler (TEdenovo + RS + RM).</p

    Deep and conservative annotation of the <i>A. thaliana</i> repeatome.

    No full text
    <p>(A) Distribution in 500 bp bins of the size of the non-redundant sequences from the “Bundle” library. (B) Distribution in 1% bins of the identity values between genomic copies and consensus sequences from the “Bundle” library. The dashed line indicates the 80% identity threshold applied to select the copies that were used to run the “Bundle_2” annotation. (C–D) Distribution in 50 bp bins for short segments (C) and in 500 bp bins for long segments (D), of the size of the repeat annotations from TAIR10 <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0094101#pone.0094101-Buisine1" target="_blank">[37]</a> and from the Bundle_complete annotation.</p

    Coverage of the <i>A. thaliana</i> genome by the different annotations presented in this work discriminating CDS versus non-CDS contributions.

    No full text
    <p>“TEdenovo + RS + RM“ refers to the non-redundant combination of annotations from TEdenovo, RepeatScout, and RepeatModeler.</p
    corecore