351 research outputs found

    A proposal for the reference-based annotation of de novo transposable element insertions

    Get PDF
    Understanding the causes and consequences of transposable element (TE) activity in the genomic era requires sophisticated bioinformatics approaches to accurately identify individual insertion sites. Next-generation sequencing technology now makes it possible to rapidly identify new TE insertions using resequencing data, opening up new possibilities to study the nature of TE-induced mutation and the target site preferences of different TE families. While the identification of new TE insertion sites is seemingly a simple task, the mechanisms of transposition present unique challenges for the annotation of de novo transposable element insertions mapped to a reference genome. Here I discuss these challenges and propose a framework for the annotation of de novo TE insertions that accommodates known mechanisms of TE insertion and established coordinate systems for genome annotation

    An age-of-allele test of neutrality for transposable element insertions

    Get PDF
    How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have used models of transposition-selection equilibrium that rely on the assumption of a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable in natural populations. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have subsequently had their allele frequency estimated in a population sample. By conditioning on the age of an individual TE insertion (using information contained in the number of substitutions that have occurred within the TE sequence since insertion), we determine the probability distribution for the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling both for nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.Comment: 40 pages, 6 figures, Supplemental Data available: [email protected]

    Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents

    Get PDF
    To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center

    Functional Evolution of a cis-Regulatory Module

    Get PDF
    Lack of knowledge about how regulatory regions evolve in relation to their structure–function may limit the utility of comparative sequence analysis in deciphering cis-regulatory sequences. To address this we applied reverse genetics to carry out a functional genetic complementation analysis of a eukaryotic cis-regulatory module—the even-skipped stripe 2 enhancer—from four Drosophila species. The evolution of this enhancer is non-clock-like, with important functional differences between closely related species and functional convergence between distantly related species. Functional divergence is attributable to differences in activation levels rather than spatiotemporal control of gene expression. Our findings have implications for understanding enhancer structure–function, mechanisms of speciation and computational identification of regulatory modules

    Systems biology of energetic and atomic costs in the yeast transcriptome, proteome, and metabolome

    Get PDF
    Proteins vary in their cost to the cell and natural selection may favour the use of proteins that are cheaper to produce. We develop a novel approach to estimate the amino acid biosynthetic cost based on genome-scale metabolic models, and directly investigate the effects of biosynthetic cost on transcriptomic, proteomic and metabolomic data in _Saccharomyces cerevisiae_. We find that our systems approach to formulating biosynthetic cost produces a novel measure that explains similar levels of variation in gene expression compared with previously reported cost measures. Regardless of the measure used, the cost of amino acid synthesis is weakly associated with transcript and protein levels, independent of codon usage bias. In contrast, energetic costs explain a large proportion of variation in levels of free amino acids. In the economy of the yeast cell, there appears to be no single currency to compute the cost of amino acid synthesis, and thus a systems approach is necessary to uncover the full effects of amino acid biosynthetic cost in complex biological systems that vary with cellular and environmental conditions

    LINNAEUS: A species name identification system for biomedical literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.</p> <p>Results</p> <p>In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.</p> <p>Conclusions</p> <p>LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at <url>http://linnaeus.sourceforge.net/</url>.</p

    Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome.

    Get PDF
    BACKGROUND: The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences. RESULTS: Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity. CONCLUSION: Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Computational analysis of transposable element target site preferences in Drosophila melanogaster

    Get PDF
    Transposable elements (TEs) are mobile DNA sequences that are a source of mutations and can target specific sites in host genome. Understanding the molecular mechanisms of TE target site preferences is a fundamental challenge in functional and evolutionary genomics. Here we used accurately mapped TE insertions in the Drosophila melanogaster genome, from large-scale gene disruption and resequencing projects, to better understand TE insertion site mechanisms. First we test predictions of the palindromic target site model for DNA transposon insertion using artificially generated P-element insertions. We provide evidence that the P-element targets a 14 bp palindromic motif that can be identified at the primary sequence level that differs significantly from random base composition in the D. melanogaster genome. This sequence also predicts local spacing, hotspots and strand orientation of P-element insertions. Next, we combine artificial P-element insertions with data from genome- wide studies on sequence properties of promoter regions, in an attempt to decode the genomic factors associated with P-element promoter targeting. Our results indicate that the P-element insertions are affected by nucleosome positioning and the presence of chromatin marks made by the Polycomb and trithorax protein groups. We provide the first genome-wide study which shows that core promoter architecture and chromatin structure impact P-element target preferences shedding light on the nuclear processes that influence its pattern of TE insertions across the D. melanogaster genome. In an effort to understand the natural insertion preferences of a wide range of TEs, we then used genome resequencing data to identify insertions sites not present in the reference strain. We found that both Illumina and 454 sequencing platforms showed consistent results in terms of target site duplication (TSD) and target site motif (TSM) discovery. We found that TSMs typically extend the TSD and are palindromic for both DNA and LTR elements with a variable center that depends on the length of the TSD. Additionally, we found that TEs from the same subclass present similar TSDs and TSMs. Finally, by correlating results on P-element insertion sites from natural strains with gene disruption experiments, we show that there is an overlap in target site preferences between artificial and natural insertion events and that P-element targeting of promoter regions of genes is a natural characteristic of this element that is influenced by the same features has the artificially generated insertions. Together, the results presented in this thesis provide important new findings about the target preferences of TEs in one of the best-studied and most important model organisms, and provide a platform for understanding target site preferences of TEs in other species using genomic data.EThOS - Electronic Theses Online ServiceFundação para a Ciência e Tecnologia, Portugal (Foundation for Science and Technology, Portugal)GBUnited Kingdo

    Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogastergenome

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events. Results Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (~80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and ~15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence. Conclusion In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.Published versio

    Principles of genome evolution in the Drosophila melanogaster species group.

    Get PDF
    That closely related species often differ by chromosomal inversions was discovered by Sturtevant and Plunkett in 1926. Our knowledge of how these inversions originate is still very limited, although a prevailing view is that they are facilitated by ectopic recombination events between inverted repetitive sequences. The availability of genome sequences of related species now allows us to study in detail the mechanisms that generate interspecific inversions. We have analyzed the breakpoint regions of the 29 inversions that differentiate the chromosomes of Drosophila melanogaster and two closely related species, D. simulans and D. yakuba, and reconstructed the molecular events that underlie their origin. Experimental and computational analysis revealed that the breakpoint regions of 59% of the inversions (17/29) are associated with inverted duplications of genes or other nonrepetitive sequences. In only two cases do we find evidence for inverted repetitive sequences in inversion breakpoints. We propose that the presence of inverted duplications associated with inversion breakpoint regions is the result of staggered breaks, either isochromatid or chromatid, and that this, rather than ectopic exchange between inverted repetitive sequences, is the prevalent mechanism for the generation of inversions in the melanogaster species group. Outgroup analysis also revealed evidence for widespread breakpoint recycling. Lastly, we have found that expression domains in D. melanogaster may be disrupted in D. yakuba, bringing into question their potential adaptive significance
    corecore