165 research outputs found

    Bulk pollen sequencing reveals rapid evolution of segregation distortion in the male germline of Arabidopsis hybrids

    Get PDF
    International audienceGenes that do not segregate in heterozygotes at Mendelian ratios are a potentially important evolutionary force in natural populations. Although the impacts of segregation distortion are widely appreciated, we have little quantitative understanding about how often these loci arise and fix within lineages. Here, we develop a statistical approach for detecting segregation distorting genes from the comprehensive comparison of whole genome sequence data obtained from bulk gamete versus somatic tissues. Our approach enables estimation of map positions and confidence intervals, and quantification of effect sizes of segregation distorters. We apply our method to the pollen of two interspecific F1 hybrids of Arabidopsis lyrata and A. halleri and we identify three loci across eight chromosomes showing significant evidence of segregation distortion in both pollen samples. Based on this, we estimate that novel segregation distortion elements evolve and achieve high frequencies within lineages at a rate of approximately one per 244,000 years. Furthermore, we estimate that haploid-acting segregation distortion may contribute between 10% and 30% of reduced pollen viability in F1 individuals. Our results indicate haploid acting factors evolve rapidly and dramatically influence segregation in F1 hybrid individuals

    Circumventing Heterozygosity: Sequencing the Amplified Genome of a Single Haploid Drosophila melanogaster Embryo

    Get PDF
    Heterozygosity is a major challenge to efficient, high-quality genomic assembly and to the full genomic survey of polymorphism and divergence. In Drosophila melanogaster lines derived from equatorial populations are particularly resistant to inbreeding, thus imposing a major barrier to the determination and analyses of genomic variation in natural populations of this model organism. Here we present a simple genome sequencing protocol based on the whole-genome amplification of the gynogenetically derived haploid genome of a progeny of females mated to males homozygous for the recessive male sterile mutation, ms(3)K81. A single “lane” of paired-end sequences (2 × 76 bp) provides a good syntenic assembly with >95% high-quality coverage (more than five reads). The amplification of the genomic DNA moderately inflates the variation in coverage across the euchromatic portion of the genome. It also increases the frequency of chimeric clones. But the low frequency and random genomic distribution of the chimeric clones limits their impact on the final assemblies. This method provides a solid path forward for population genomic sequencing and offers applications to many other systems in which small amounts of genomic DNA have unique experimental relevance

    The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Get PDF
    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets

    Horizontal Transmission and Recombination Maintain forever Young Bacterial Symbiont Genomes

    Get PDF
    Bacterial symbionts bring a wealth of functions to the associations they participate in, but by doing so, they endanger the genes and genomes underlying these abilities. When bacterial symbionts become obligately associated with their hosts, their genomes are thought to decay towards an organelle-like fate due to decreased homologous recombination and inef- ficient selection. However, numerous associations exist that counter these expectations, especially in marine environments, possibly due to ongoing horizontal gene flow. Despite extensive theoretical treatment, no empirical study thus far has connected these underlying population genetic processes with long-term evolutionary outcomes. By sampling marine chemosynthetic bacterial-bivalve endosymbioses that range from primarily vertical to strictly horizontal transmission, we tested this canonical theory. We found that transmission mode strongly predicts homologous recombination rates, and that exceedingly low recombination rates are associated with moderate genome degradation in the marine symbionts with nearly strict vertical transmission. Nonetheless, even the most degraded marine endosym- biont genomes are occasionally horizontally transmitted and are much larger than their ter- restrial insect symbiont counterparts. Therefore, horizontal transmission and recombination enable efficient natural selection to maintain intermediate symbiont genome sizes and sub- stantial functional genetic variation

    Selection on Coding and Regulatory Variation Maintains Individuality in Major Urinary Protein Scent Marks in Wild Mice

    Get PDF
    Recognition of individuals by scent is widespread across animal taxa. Though animals can often discriminate chemical blends based on many compounds, recent work shows that specific protein pheromones are necessary and sufficient for individual recognition via scent marks in mice. The genetic nature of individuality in scent marks (e.g. coding versus regulatory variation) and the evolutionary processes that maintain diversity are poorly understood. The individual signatures in scent marks of house mice are the protein products of a group of highly similar paralogs in the major urinary protein (Mup) gene family. Using the offspring of wild-caught mice, we examine individuality in the major urinary protein (MUP) scent marks at the DNA, RNA and protein levels. We show that individuality arises through a combination of variation at amino acid coding sites and differential transcription of central Mup genes across individuals, and we identify eSNPs in promoters. There is no evidence of post-transcriptional processes influencing phenotypic diversity as transcripts accurately predict the relative abundance of proteins in urine samples. The match between transcripts and urine samples taken six months earlier also emphasizes that the proportional relationships across central MUP isoforms in urine is stable. Balancing selection maintains coding variants at moderate frequencies, though pheromone diversity appears limited by interactions with vomeronasal receptors. We find that differential transcription of the central Mup paralogs within and between individuals significantly increases the individuality of pheromone blends. Balancing selection on gene regulation allows for increased individuality via combinatorial diversity in a limited number of pheromones

    Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2

    Get PDF
    The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.N.G., C.W., and N.D.M. were supported by the European Molecular Biology Laboratory (EMBL). R.C.-D. was supported by R35GM128932 and by an Alfred P. Sloan foundation fellowship. R.L. was funded by Australian Research Council grant DP200103151, and by a Chan-Zuckerberg Initiative grant. We are very grateful to GISAID and all the groups who shared their sequencing data

    Conserved novel ORFs in the mitochondrial genome of the ctenophore Beroe forskalii

    Get PDF
    To date, five ctenophore species’ mitochondrial genomes have been sequenced, and each contains open reading frames (ORFs) that if translated have no identifiable orthologs. ORFs with no identifiable orthologs are called unidentified reading frames (URFs). If truly protein-coding, ctenophore mitochondrial URFs represent a little understood path in early-diverging metazoan mitochondrial evolution and metabolism. We sequenced and annotated the mitochondrial genomes of three individuals of the beroid ctenophore Beroe forskalii and found that in addition to sharing the same canonical mitochondrial genes as other ctenophores, the B. forskalii mitochondrial genome contains two URFs. These URFs are conserved among the three individuals but not found in other sequenced species. We developed computational tools called pauvre and cuttlery to determine the likelihood that URFs are protein coding. There is evidence that the two URFs are under negative selection, and a novel Bayesian hypothesis test of trinucleotide frequency shows that the URFs are more similar to known coding genes than noncoding intergenic sequence. Protein structure and function prediction of all ctenophore URFs suggests that they all code for transmembrane transport proteins. These findings, along with the presence of URFs in other sequenced ctenophore mitochondrial genomes, suggest that ctenophores may have uncharacterized transmembrane proteins present in their mitochondria

    A lung-specific mutational signature enables inference of viral and bacterial respiratory niche

    Get PDF
    Exposure to different mutagens leaves distinct mutational patterns that can allow inference of pathogen replication niches. We therefore investigated whether SARS-CoV-2 mutational spectra might show lineage-specific differences, dependent on the dominant site(s) of replication and onwards transmission, and could therefore rapidly infer virulence of emergent variants of concern (VOCs). Through mutational spectrum analysis, we found a significant reduction in G>T mutations in the Omicron variant, which replicates in the upper respiratory tract (URT), compared to other lineages, which replicate in both the URT and lower respiratory tract (LRT). Mutational analysis of other viruses and bacteria indicates a robust, generalizable association of high G>T mutations with replication within the LRT. Monitoring G>T mutation rates over time, we found early separation of Omicron from Beta, Gamma and Delta, while mutational patterns in Alpha varied consistent with changes in transmission source as social restrictions were lifted. Mutational spectra may be a powerful tool to infer niches of established and emergent pathogens.Fil: Ruis, Christopher. University of Cambridge; Estados UnidosFil: Peacock, Thomas P.. Imperial College London; Reino UnidoFil: Polo Ilacqua, Luis Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Ciencias Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; ArgentinaFil: Masone, Diego Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Ciencias Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; ArgentinaFil: Alvarez, Maria Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Medicina y Biología Experimental de Cuyo; ArgentinaFil: Hinrichs, Angie S.. University Of California At Santa Cruz.; Estados UnidosFil: Turakhia, Yatish. University of California at San Diego; Estados UnidosFil: Cheng, Ye. University of California at San Diego; Estados UnidosFil: McBroome, Jakob. University Of California At Santa Cruz.; Estados UnidosFil: Corbett Detig, Russell. University Of California At Santa Cruz.; Estados UnidosFil: Parkhill, Julian. University of Cambridge; Reino UnidoFil: Floto, R. Andres. University of Cambridge; Reino Unid

    Stability of SARS-CoV-2 phylogenies.

    Get PDF
    Funder: Alfred P. Sloan Foundation; funder-id: http://dx.doi.org/10.13039/100000879Funder: European Molecular Biology Laboratory (EMBL)The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse
    corecore