24 research outputs found

    Whole Genome Resequencing Reveals Natural Target Site Preferences of Transposable Elements in Drosophila melanogaster

    Get PDF
    Transposable elements are mobile DNA sequences that integrate into host genomes using diverse mechanisms with varying degrees of target site specificity. While the target site preferences of some engineered transposable elements are well studied, the natural target preferences of most transposable elements are poorly characterized. Using population genomic resequencing data from 166 strains of Drosophila melanogaster, we identified over 8,000 new insertion sites not present in the reference genome sequence that we used to decode the natural target preferences of 22 families of transposable element in this species. We found that terminal inverted repeat transposon and long terminal repeat retrotransposon families present clade-specific target site duplications and target site sequence motifs. Additionally, we found that the sequence motifs at transposable element target sites are always palindromes that extend beyond the target site duplication. Our results demonstrate the utility of population genomics data for high-throughput inference of transposable element targeting preferences in the wild and establish general rules for terminal inverted repeat transposon and long terminal repeat retrotransposon target site selection in eukaryotic genomes

    Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster

    Get PDF
    Wolbachia are maternally-inherited symbiotic bacteria commonly found in arthropods, which are able to manipulate the reproduction of their host in order to maximise their transmission. Here we use whole genome resequencing data from 290 lines of Drosophila melanogaster from North America, Europe and Africa to predict Wolbachia infection status, estimate cytoplasmic genome copy number, and reconstruct Wolbachia and mtDNA genome sequences. Complete Wolbachia and mitochondrial genomes show congruent phylogenies, consistent with strict vertical transmission through the maternal cytoplasm and imperfect transmission of Wolbachia. Bayesian phylogenetic analysis reveals that the most recent common ancestor of all Wolbachia and mitochondrial genomes in D. melanogaster dates to around 8,000 years ago. We find evidence for a recent incomplete global replacement of ancestral Wolbachia and mtDNA lineages, which is likely to be one of several similar incomplete replacement events that have occurred since the out-of-Africa migration that allowed D. melanogaster to colonize worldwide habitats.Comment: 41 pages, 5 figure

    Testing the palindromic target site model for DNA transposon insertion using the Drosophila melanogaster P-element

    Get PDF
    Understanding the molecular mechanisms that influence transposable element target site preferences is a fundamental challenge in functional and evolutionary genomics. Large-scale transposon insertion projects provide excellent material to study target site preferences in the absence of confounding effects of post-insertion evolutionary change. Growing evidence from a wide variety of prokaryotes and eukaryotes indicates that DNA transposons recognize staggered-cut palindromic target site motifs (TSMs). Here, we use over 10 000 accurately mapped P-element insertions in the Drosophila melanogaster genome to test predictions of the staggered-cut palindromic target site model for DNA transposon insertion. We provide evidence that the P-element targets a 14-bp palindromic motif that can be identified at the primary sequence level, which predicts the local spacing, hotspots and strand orientation of P-element insertions. Intriguingly, we find that the although P-element destroys the complete 14-bp target site upon insertion, the terminal three nucleotides of the P-element inverted repeats complement and restore the original TSM, suggesting a mechanistic link between transposon target sites and their terminal inverted repeats. Finally, we discuss how the staggered-cut palindromic target site model can be used to assess the accuracy of genome mappings for annotated P-element insertions

    McClintock:An integrated pipeline for detecting transposable element insertions in whole-genome shotgun sequencing data

    Get PDF
    Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae. We find substantial variation among McClintock component methods in their ability to detect nonreference TEs in the yeast genome, but show that nonreference TEs at nearly all biologically realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer nonreference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, target preferences, and target site duplication structure, albeit with varying levels of accuracy. Our work provides a general framework for integrating and analyzing results from multiple TE detection methods, as well as useful guidance for researchers studying TEs in yeast resequencing data

    Sequence logos for target site motifs of 22 <i>D. melanogaster</i> TIR and LTR families.

    No full text
    <p>Predicted TSMs plotted as sequence logos for sequences ±3 bp around the TSD for TE families with eight or more insertion sites. Plots are organized by order (TIR then LTR) and superfamily, and are labeled with order/superfamily, family name, predicted TSD length, and total number of insertion sites (in parentheses) in the top right corner. The y-axis is the same for all the logos and ranges from a bit score of zero to two. The line below the logo represents the TSD.</p

    Optimal TSD length and number of <i>de novo</i> insertion sites based on Illumina data.

    No full text
    <p>Families with fewer than eight insertion sites were excluded from further analyses of TSD and TSM structure, but often show similar modal TSD length to related TE families.</p

    Read length and number of insertions per strain for DGRP resequencing datasets.

    No full text
    <p>Summary of data from the 454 platform (A) and the Illumina platform (B). Points represent the maximum, minimum and mean read length for each strains (scale bar on left). Bars represent the total number of elements identified per strain (scale bar on right). Gray bars represent the number of insertions for strains sequenced by both 454 and Illumina, and black bars represent the number of insertions from strains with platform-specific sequence data. Strain identifiers labeled alternately on the top and bottom of the graph.</p
    corecore