20 research outputs found

    Catastrophic chromosomal restructuring during genome elimination in plants.

    Get PDF
    Genome instability is associated with mitotic errors and cancer. This phenomenon can lead to deleterious rearrangements, but also genetic novelty, and many questions regarding its genesis, fate and evolutionary role remain unanswered. Here, we describe extreme chromosomal restructuring during genome elimination, a process resulting from hybridization of Arabidopsis plants expressing different centromere histones H3. Shattered chromosomes are formed from the genome of the haploid inducer, consistent with genomic catastrophes affecting a single, laggard chromosome compartmentalized within a micronucleus. Analysis of breakpoint junctions implicates breaks followed by repair through non-homologous end joining (NHEJ) or stalled fork repair. Furthermore, mutation of required NHEJ factor DNA Ligase 4 results in enhanced haploid recovery. Lastly, heritability and stability of a rearranged chromosome suggest a potential for enduring genomic novelty. These findings provide a tractable, natural system towards investigating the causes and mechanisms of complex genomic rearrangements similar to those associated with several human disorders

    Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

    Get PDF
    Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. The assumption that the most abundant tandem repeat is the centromere DNA was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and in length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond ~50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution, including the appearance of higher order repeat structures in which several polymorphic monomers make up a larger repeating unit. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animals and plants. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes

    Longer First Introns Are a General Property of Eukaryotic Gene Structure

    Get PDF
    While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 5′ UTR is - on average - significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations

    Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

    Get PDF
    Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another

    First introns are the longest introns in most species.

    No full text
    <p>Results shown for all species in GenBank release 164 which have at least 500 CDSs that specify multiple introns. Z-tests were used to determine significance and color denotes level of significance (see legend, N.S. = not significant).</p

    Incorrect <i>C. elegans</i> gene annotation determined by inspection of intron lengths.

    No full text
    <p>This gene prediction contained an incorrect in-frame intron sequence in the first exon. Transcript evidence, homology evidence from <i>C. briggsae</i>, and an alternative gene prediction (Twinscan) suggested that the first intron is an annotation error. Image taken from Genome Browser display of WormBase release WS180 (<a href="http://ws180.wormbase.org" target="_blank">http://ws180.wormbase.org</a>).</p

    Intron size variation for selected species with different numbers of introns.

    No full text
    <p>Intron lengths are shown for species with CDSs that contain 4, 6, 7 or 9 introns (in <i>D. melanogaster, A. thaliana</i>, <i>C. elegans</i>, and <i>H. sapiens</i> respectively). Bars on graph show standard error of the mean. Numbers of CDSs used for each species are shown.</p

    Intron length variation in three model organisms.

    No full text
    <p>Mean intron length is calculated for the first intron in the 5′ UTR (position −1, in blue) and for the first eight introns of the coding sequence (in red) for three named species. Error bars indicate standard error of the mean. Bottom right panel shows the occurrence of a potential IME motif (pictured) in <i>A. thaliana</i> introns. %Motif density is calculated by concatenating together all introns in each category, and then calculating what fraction of the total sequence is occupied by the motif.</p
    corecore