204 research outputs found

    A proposal for the reference-based annotation of de novo transposable element insertions

    Get PDF
    Understanding the causes and consequences of transposable element (TE) activity in the genomic era requires sophisticated bioinformatics approaches to accurately identify individual insertion sites. Next-generation sequencing technology now makes it possible to rapidly identify new TE insertions using resequencing data, opening up new possibilities to study the nature of TE-induced mutation and the target site preferences of different TE families. While the identification of new TE insertion sites is seemingly a simple task, the mechanisms of transposition present unique challenges for the annotation of de novo transposable element insertions mapped to a reference genome. Here I discuss these challenges and propose a framework for the annotation of de novo TE insertions that accommodates known mechanisms of TE insertion and established coordinate systems for genome annotation

    An age-of-allele test of neutrality for transposable element insertions

    Get PDF
    How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have used models of transposition-selection equilibrium that rely on the assumption of a constant rate of transposition. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable in natural populations. Here we propose a test of neutrality for TE insertions that does not rely on the assumption of a constant transposition rate. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have subsequently had their allele frequency estimated in a population sample. By conditioning on the age of an individual TE insertion (using information contained in the number of substitutions that have occurred within the TE sequence since insertion), we determine the probability distribution for the insertion allele frequency in a population sample under neutrality. Taking models of varying population size into account, we then evaluate predictions of our model against allele frequency data from 190 retrotransposon insertions sampled from North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies based on age alone. Controlling both for nonequilibrium dynamics of transposition and host demography, we provide evidence for negative selection acting against most TEs as well as for positive selection acting on a small subset of TEs. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs, gene duplications or other copy number variants.Comment: 40 pages, 6 figures, Supplemental Data available: [email protected]

    Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents

    Get PDF
    To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center

    LINNAEUS: A species name identification system for biomedical literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.</p> <p>Results</p> <p>In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.</p> <p>Conclusions</p> <p>LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at <url>http://linnaeus.sourceforge.net/</url>.</p

    Functional Evolution of a cis-Regulatory Module

    Get PDF
    Lack of knowledge about how regulatory regions evolve in relation to their structure–function may limit the utility of comparative sequence analysis in deciphering cis-regulatory sequences. To address this we applied reverse genetics to carry out a functional genetic complementation analysis of a eukaryotic cis-regulatory module—the even-skipped stripe 2 enhancer—from four Drosophila species. The evolution of this enhancer is non-clock-like, with important functional differences between closely related species and functional convergence between distantly related species. Functional divergence is attributable to differences in activation levels rather than spatiotemporal control of gene expression. Our findings have implications for understanding enhancer structure–function, mechanisms of speciation and computational identification of regulatory modules

    Systems biology of energetic and atomic costs in the yeast transcriptome, proteome, and metabolome

    Get PDF
    Proteins vary in their cost to the cell and natural selection may favour the use of proteins that are cheaper to produce. We develop a novel approach to estimate the amino acid biosynthetic cost based on genome-scale metabolic models, and directly investigate the effects of biosynthetic cost on transcriptomic, proteomic and metabolomic data in _Saccharomyces cerevisiae_. We find that our systems approach to formulating biosynthetic cost produces a novel measure that explains similar levels of variation in gene expression compared with previously reported cost measures. Regardless of the measure used, the cost of amino acid synthesis is weakly associated with transcript and protein levels, independent of codon usage bias. In contrast, energetic costs explain a large proportion of variation in levels of free amino acids. In the economy of the yeast cell, there appears to be no single currency to compute the cost of amino acid synthesis, and thus a systems approach is necessary to uncover the full effects of amino acid biosynthetic cost in complex biological systems that vary with cellular and environmental conditions

    Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome.

    Get PDF
    BACKGROUND: The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences. RESULTS: Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity. CONCLUSION: Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogastergenome

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events. Results Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (~80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and ~15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence. Conclusion In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene structures. Our results highlight the need to establish the fixity of putative cases of TE domestication identified using genome sequences in order to demonstrate their functional importance, and reveal that the contribution of TE domestication to genome evolution may vary drastically among animal taxa.Published versio

    Principles of genome evolution in the Drosophila melanogaster species group.

    Get PDF
    That closely related species often differ by chromosomal inversions was discovered by Sturtevant and Plunkett in 1926. Our knowledge of how these inversions originate is still very limited, although a prevailing view is that they are facilitated by ectopic recombination events between inverted repetitive sequences. The availability of genome sequences of related species now allows us to study in detail the mechanisms that generate interspecific inversions. We have analyzed the breakpoint regions of the 29 inversions that differentiate the chromosomes of Drosophila melanogaster and two closely related species, D. simulans and D. yakuba, and reconstructed the molecular events that underlie their origin. Experimental and computational analysis revealed that the breakpoint regions of 59% of the inversions (17/29) are associated with inverted duplications of genes or other nonrepetitive sequences. In only two cases do we find evidence for inverted repetitive sequences in inversion breakpoints. We propose that the presence of inverted duplications associated with inversion breakpoint regions is the result of staggered breaks, either isochromatid or chromatid, and that this, rather than ectopic exchange between inverted repetitive sequences, is the prevalent mechanism for the generation of inversions in the melanogaster species group. Outgroup analysis also revealed evidence for widespread breakpoint recycling. Lastly, we have found that expression domains in D. melanogaster may be disrupted in D. yakuba, bringing into question their potential adaptive significance

    Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

    Get PDF
    Symbiotic interactions between microbes and their multicellular hosts have manifold impacts on molecular, cellular and organismal biology. To identify candidate bacterial genes involved in maintaining endosymbiotic associations with insect hosts, we analyzed genome-wide patterns of gene expression in the alpha-proteobacteria Wolbachia pipientis across the life cycle of Drosophila melanogaster using public data from the modENCODE project that was generated in a Wolbachia-infected version of the ISO1 reference strain. We find that the majority of Wolbachia genes are expressed at detectable levels in D. melanogaster across the entire life cycle, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or sex-specific expression differences when studied in the "holo-organism" context. Wolbachia genes that are differentially expressed during development are typically up-regulated after D. melanogaster embryogenesis, and include many bacterial membrane, secretion system and ankyrin-repeat containing proteins. Sex-biased genes are often organised as small operons of uncharacterised genes and are mainly up-regulated in adult males D. melanogaster in an age-dependent manner suggesting a potential role in cytoplasmic incompatibility. Our results indicate that large changes in Wolbachia gene expression across the Drosophila life-cycle are relatively rare when assayed across all host tissues, but that candidate genes to understand host-microbe interaction in facultative endosymbionts can be successfully identified using holo-organism expression profiling. Our work also shows that mining public gene expression data in D. melanogaster provides a rich set of resources to probe the functional basis of the Wolbachia-Drosophila symbiosis and annotate the transcriptional outputs of the Wolbachia genome.Comment: 58 pages, 6 figures, 6 supplemental figures, 4 supplemental files (available at https://github.com/bergmanlab/wolbachia/tree/master/gutzwiller_et_al/arxiv
    corecore