178 research outputs found

    T-lex3 : An accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data

    Get PDF
    Motivation: Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. Results: In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. Availability and implementation: To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3

    A Strong Deletion Bias in Nonallelic Gene Conversion

    Get PDF
    Gene conversion is the unidirectional transfer of genetic information between orthologous (allelic) or paralogous (nonallelic) genomic segments. Though a number of studies have examined nucleotide replacements, little is known about length difference mutations produced by gene conversion. Here, we investigate insertions and deletions produced by nonallelic gene conversion in 338 Drosophila and 10,149 primate paralogs. Using a direct phylogenetic approach, we identify 179 insertions and 614 deletions in Drosophila paralogs, and 132 insertions and 455 deletions in primate paralogs. Thus, nonallelic gene conversion is strongly deletion-biased in both lineages, with almost 3.5 times as many conversion-induced deletions as insertions. In primates, the deletion bias is considerably stronger for long indels and, in both lineages, the per-site rate of gene conversion is orders of magnitudes higher than that of ordinary mutation. Due to this high rate, deletion-biased nonallelic gene conversion plays a key role in genome size evolution, leading to the cooperative shrinkage and eventual disappearance of selectively neutral paralogs

    Genome-wide fine-scale recombination rate variation in Drosophila melanogaster

    Get PDF
    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity

    Duplication and Gene Conversion in the Drosophila melanogaster Genome

    Get PDF
    Using the genomic sequences of Drosophila melanogaster subgroup, the pattern of gene duplications was investigated with special attention to interlocus gene conversion. Our fine-scale analysis with careful visual inspections enabled accurate identification of a number of duplicated blocks (genomic regions). The orthologous parts of those duplicated blocks were also identified in the D. simulans and D. sechellia genomes, by which we were able to clearly classify the duplicated blocks into post- and pre-speciation blocks. We found 31 post-speciation duplicated genes, from which the rate of gene duplication (from one copy to two copies) is estimated to be 1.0×10−9 per single-copy gene per year. The role of interlocus gene conversion was observed in several respects in our data: (1) synonymous divergence between a duplicated pair is overall very low. Consequently, the gene duplication rate would be seriously overestimated by counting duplicated genes with low divergence; (2) the sizes of young duplicated blocks are generally large. We postulate that the degeneration of gene conversion around the edges could explain the shrinkage of “identifiable” duplicated regions; and (3) elevated paralogous divergence is observed around the edges in many duplicated blocks, supporting our gene conversion–degeneration model. Our analysis demonstrated that gene conversion between duplicated regions is a common and genome-wide phenomenon in the Drosophila genomes, and that its role should be especially significant in the early stages of duplicated genes. Based on a population genetic prediction, we applied a new genome-scan method to test for signatures of selection for neofunctionalization and found a strong signature in a pair of transporter genes

    Intron retention in the Drosophila melanogaster Rieske iron sulphur protein gene generated a new protein

    Get PDF
    Genomes can encode a variety of proteins with unrelated architectures and activities. It is known that protein-coding genes of de novo origin have significantly contributed to this diversity. However, the molecular mechanisms and evolutionary processes behind these originations are still poorly understood. Here we show that the last 102 codons of a novel gene, Noble, assembled directly from non-coding DNA following an intronic deletion that induced alternative intron retention at the Drosophila melanogaster Rieske Iron Sulphur Protein (RFeSP) locus. A systematic analysis of the evolutionary processes behind the origin of Noble showed that its emergence was strongly biased by natural selection on and around the RFeSP locus. Noble mRNA is shown to encode a bona fide protein that lacks an iron sulphur domain and localizes to mitochondria. Together, these results demonstrate the generation of a novel protein at a naturally selected site

    Evidence that Adaptation in Drosophila Is Not Limited by Mutation at Single Sites

    Get PDF
    Adaptation in eukaryotes is generally assumed to be mutation-limited because of small effective population sizes. This view is difficult to reconcile, however, with the observation that adaptation to anthropogenic changes, such as the introduction of pesticides, can occur very rapidly. Here we investigate adaptation at a key insecticide resistance locus (Ace) in Drosophila melanogaster and show that multiple simple and complex resistance alleles evolved quickly and repeatedly within individual populations. Our results imply that the current effective population size of modern D. melanogaster populations is likely to be substantially larger (≥100-fold) than commonly believed. This discrepancy arises because estimates of the effective population size are generally derived from levels of standing variation and thus reveal long-term population dynamics dominated by sharp—even if infrequent—bottlenecks. The short-term effective population sizes relevant for strong adaptation, on the other hand, might be much closer to census population sizes. Adaptation in Drosophila may therefore not be limited by waiting for mutations at single sites, and complex adaptive alleles can be generated quickly without fixation of intermediate states. Adaptive events should also commonly involve the simultaneous rise in frequency of independently generated adaptive mutations. These so-called soft sweeps have very distinct effects on the linked neutral polymorphisms compared to the standard hard sweeps in mutation-limited scenarios. Methods for the mapping of adaptive mutations or association mapping of evolutionarily relevant mutations may thus need to be reconsidered

    Time and Origin of Cichlid Colonization of the Lower Congo Rapids

    Get PDF
    Most freshwater diversity is arguably located in networks of rivers and streams, but, in contrast to lacustrine systems riverine radiations, are largely understudied. The extensive rapids of the lower Congo River is one of the few river stretches inhabited by a locally endemic cichlid species flock as well as several species pairs, for which we provide evidence that they have radiated in situ. We use more that 2,000 AFLP markers as well as multilocus sequence datasets to reconstruct their origin, phylogenetic history, as well as the timing of colonization and speciation of two Lower Congo cichlid genera, Steatocranus and Nanochromis. Based on a representative taxon sampling and well resolved phylogenetic hypotheses we demonstrate that a high level of riverine diversity originated in the lower Congo within about 5 mya, which is concordant with age estimates for the hydrological origin of the modern lower Congo River. A spatial genetic structure is present in all widely distributed lineages corresponding to a trisection of the lower Congo River into major biogeographic areas, each with locally endemic species assemblages. With the present study, we provide a phylogenetic framework for a complex system that may serve as a link between African riverine cichlid diversity and the megadiverse cichlid radiations of the East African lakes. Beyond this we give for the first time a biologically estimated age for the origin of the lower Congo River rapids, one of the most extreme freshwater habitats on earth

    Repair-Mediated Duplication by Capture of Proximal Chromosomal DNA Has Shaped Vertebrate Genome Evolution

    Get PDF
    DNA double-strand breaks (DSBs) are a common form of cellular damage that can lead to cell death if not repaired promptly. Experimental systems have shown that DSB repair in eukaryotic cells is often imperfect and may result in the insertion of extra chromosomal DNA or the duplication of existing DNA at the breakpoint. These events are thought to be a source of genomic instability and human diseases, but it is unclear whether they have contributed significantly to genome evolution. Here we developed an innovative computational pipeline that takes advantage of the repetitive structure of genomes to detect repair-mediated duplication events (RDs) that occurred in the germline and created insertions of at least 50 bp of genomic DNA. Using this pipeline we identified over 1,000 probable RDs in the human genome. Of these, 824 were intra-chromosomal, closely linked duplications of up to 619 bp bearing the hallmarks of the synthesis-dependent strand-annealing repair pathway. This mechanism has duplicated hundreds of sequences predicted to be functional in the human genome, including exons, UTRs, intron splice sites and transcription factor binding sites. Dating of the duplication events using comparative genomics and experimental validation revealed that the mechanism has operated continuously but with decreasing intensity throughout primate evolution. The mechanism has produced species-specific duplications in all primate species surveyed and is contributing to genomic variation among humans. Finally, we show that RDs have also occurred, albeit at a lower frequency, in non-primate mammals and other vertebrates, indicating that this mechanism has been an important force shaping vertebrate genome evolution

    Population Genomic Inferences from Sparse High-Throughput Sequencing of Two Populations of Drosophila melanogaster

    Get PDF
    Short-read sequencing techniques provide the opportunity to capture genome-wide sequence data in a single experiment. A current challenge is to identify questions that shallow-depth genomic data can address successfully and to develop corresponding analytical methods that are statistically sound. Here, we apply the Roche/454 platform to survey natural variation in strains of Drosophila melanogaster from an African (n = 3) and a North American (n = 6) population. Reads were aligned to the reference D. melanogaster genomic assembly, single nucleotide polymorphisms were identified, and nucleotide variation was quantified genome wide. Simulations and empirical results suggest that nucleotide diversity can be accurately estimated from sparse data with as little as 0.2× coverage per line. The unbiased genomic sampling provided by random short-read sequencing also allows insight into distributions of transposable elements and copy number polymorphisms found within populations and demonstrates that short-read sequencing methods provide an efficient means to quantify variation in genome organization and content. Continued development of methods for statistical inference of shallow-depth genome-wide sequencing data will allow such sparse, partial data sets to become the norm in the emerging field of population genomics

    Drosophila Duplication Hotspots Are Associated with Late-Replicating Regions of the Genome

    Get PDF
    Duplications play a significant role in both extremes of the phenotypic spectrum of newly arising mutations: they can have severe deleterious effects (e.g. duplications underlie a variety of diseases) but can also be highly advantageous. The phenotypic potential of newly arisen duplications has stimulated wide interest in both the mutational and selective processes shaping these variants in the genome. Here we take advantage of the Drosophila simulans–Drosophila melanogaster genetic system to further our understanding of both processes. Regarding mutational processes, the study of two closely related species allows investigation of the potential existence of shared duplication hotspots, and the similarities and differences between the two genomes can be used to dissect its underlying causes. Regarding selection, the difference in the effective population size between the two species can be leveraged to ask questions about the strength of selection acting on different classes of duplications. In this study, we conducted a survey of duplication polymorphisms in 14 different lines of D. simulans using tiling microarrays and combined it with an analogous survey for the D. melanogaster genome. By integrating the two datasets, we identified duplication hotspots conserved between the two species. However, unlike the duplication hotspots identified in mammalian genomes, Drosophila duplication hotspots are not associated with sequences of high sequence identity capable of mediating non-allelic homologous recombination. Instead, Drosophila duplication hotspots are associated with late-replicating regions of the genome, suggesting a link between DNA replication and duplication rates. We also found evidence supporting a higher effectiveness of selection on duplications in D. simulans than in D. melanogaster. This is also true for duplications segregating at high frequency, where we find evidence in D. simulans that a sizeable fraction of these mutations is being driven to fixation by positive selection
    corecore