351 research outputs found
Recommended from our members
Next-Generation Population Genomics: Inversion Polymorphisms, Segregation Distortion and Fitness Epistasis
Although population genetics has a long history and firm theoretical basis, until recently little data was available for empirical hypothesis testing. The unprecedented growth of sequencing methodologies has transformed the discipline from data-poor and theory rich field into one virtually unlimited by the available of suitable data. In this thesis, we develop bioinformatic methods to address a variety of longstanding questions in the field of evolutionary genetics. Specifically, we use data derived from model organisms to study the evolution of inversion polymorphisms, segregation distorters and fitness epistasis. In the first chapter, we develop methods for detecting chromosomal inversions using next-generation sequencing data. Subsequently, we show that chromosomal inversions in Drosophila melanogaster are evolutionarily young, and at least one has likely achieved polymorphic frequencies via sex-ratio segregation distortion. In the third chapter, we develop a method of surveying the genome for segregation distortion in an unbiased manner, and show that segregation distortion does not contribute to hybrid male sterility in one pair of house mouse populations. Finally, we show that contrary to expectations, gene-gene interactions are widespread within species, which challenges a central paradigm of speciation research
Recommended from our members
Transfer RNA genes experience exceptionally elevated mutation rates.
Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations
Bulk pollen sequencing reveals rapid evolution of segregation distortion in the male germline of Arabidopsis hybrids
International audienceGenes that do not segregate in heterozygotes at Mendelian ratios are a potentially important evolutionary force in natural populations. Although the impacts of segregation distortion are widely appreciated, we have little quantitative understanding about how often these loci arise and fix within lineages. Here, we develop a statistical approach for detecting segregation distorting genes from the comprehensive comparison of whole genome sequence data obtained from bulk gamete versus somatic tissues. Our approach enables estimation of map positions and confidence intervals, and quantification of effect sizes of segregation distorters. We apply our method to the pollen of two interspecific F1 hybrids of Arabidopsis lyrata and A. halleri and we identify three loci across eight chromosomes showing significant evidence of segregation distortion in both pollen samples. Based on this, we estimate that novel segregation distortion elements evolve and achieve high frequencies within lineages at a rate of approximately one per 244,000 years. Furthermore, we estimate that haploid-acting segregation distortion may contribute between 10% and 30% of reduced pollen viability in F1 individuals. Our results indicate haploid acting factors evolve rapidly and dramatically influence segregation in F1 hybrid individuals
Circumventing Heterozygosity: Sequencing the Amplified Genome of a Single Haploid Drosophila melanogaster Embryo
Heterozygosity is a major challenge to efficient, high-quality genomic assembly and to the full genomic survey of polymorphism and divergence. In Drosophila melanogaster lines derived from equatorial populations are particularly resistant to inbreeding, thus imposing a major barrier to the determination and analyses of genomic variation in natural populations of this model organism. Here we present a simple genome sequencing protocol based on the whole-genome amplification of the gynogenetically derived haploid genome of a progeny of females mated to males homozygous for the recessive male sterile mutation, ms(3)K81. A single “lane” of paired-end sequences (2 × 76 bp) provides a good syntenic assembly with >95% high-quality coverage (more than five reads). The amplification of the genomic DNA moderately inflates the variation in coverage across the euchromatic portion of the genome. It also increases the frequency of chimeric clones. But the low frequency and random genomic distribution of the chimeric clones limits their impact on the final assemblies. This method provides a solid path forward for population genomic sequencing and offers applications to many other systems in which small amounts of genomic DNA have unique experimental relevance
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.
Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets
Breathing Easy: Lung Health and Associated Conditions in the Day Care Setting
Introduction: Air pollutants are associated with many health risks. Children in the day care environment are uniquely suscept-ible to lung damage, infection, systemic illness & pollutant triggered hypersensitivity reactions. The latest public report by the CDC reports Vermont’s (VT) asthma rate is the high-est in the country at 11.1%. This project compared VT’s day care regulations regarding specific environmental factors linked with health risks to regulations in six surrounding New England states. We sought to assess whether VT’s regulations adequately protect children in day carehttps://scholarworks.uvm.edu/comphp_gallery/1064/thumbnail.jp
Selection on Coding and Regulatory Variation Maintains Individuality in Major Urinary Protein Scent Marks in Wild Mice
Recognition of individuals by scent is widespread across animal taxa. Though animals can often discriminate chemical blends based on many compounds, recent work shows that specific protein pheromones are necessary and sufficient for individual recognition via scent marks in mice. The genetic nature of individuality in scent marks (e.g. coding versus regulatory variation) and the evolutionary processes that maintain diversity are poorly understood. The individual signatures in scent marks of house mice are the protein products of a group of highly similar paralogs in the major urinary protein (Mup) gene family. Using the offspring of wild-caught mice, we examine individuality in the major urinary protein (MUP) scent marks at the DNA, RNA and protein levels. We show that individuality arises through a combination of variation at amino acid coding sites and differential transcription of central Mup genes across individuals, and we identify eSNPs in promoters. There is no evidence of post-transcriptional processes influencing phenotypic diversity as transcripts accurately predict the relative abundance of proteins in urine samples. The match between transcripts and urine samples taken six months earlier also emphasizes that the proportional relationships across central MUP isoforms in urine is stable. Balancing selection maintains coding variants at moderate frequencies, though pheromone diversity appears limited by interactions with vomeronasal receptors. We find that differential transcription of the central Mup paralogs within and between individuals significantly increases the individuality of pheromone blends. Balancing selection on gene regulation allows for increased individuality via combinatorial diversity in a limited number of pheromones
Horizontal Transmission and Recombination Maintain forever Young Bacterial Symbiont Genomes
Bacterial symbionts bring a wealth of functions to the associations they participate in, but by doing so, they endanger the genes and genomes underlying these abilities. When bacterial symbionts become obligately associated with their hosts, their genomes are thought to decay towards an organelle-like fate due to decreased homologous recombination and inef- ficient selection. However, numerous associations exist that counter these expectations, especially in marine environments, possibly due to ongoing horizontal gene flow. Despite extensive theoretical treatment, no empirical study thus far has connected these underlying population genetic processes with long-term evolutionary outcomes. By sampling marine chemosynthetic bacterial-bivalve endosymbioses that range from primarily vertical to strictly horizontal transmission, we tested this canonical theory. We found that transmission mode strongly predicts homologous recombination rates, and that exceedingly low recombination rates are associated with moderate genome degradation in the marine symbionts with nearly strict vertical transmission. Nonetheless, even the most degraded marine endosym- biont genomes are occasionally horizontally transmitted and are much larger than their ter- restrial insect symbiont counterparts. Therefore, horizontal transmission and recombination enable efficient natural selection to maintain intermediate symbiont genome sizes and sub- stantial functional genetic variation
Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.N.G., C.W., and N.D.M. were supported by the European
Molecular Biology Laboratory (EMBL). R.C.-D. was supported
by R35GM128932 and by an Alfred P. Sloan foundation fellowship. R.L. was funded by Australian Research Council
grant DP200103151, and by a Chan-Zuckerberg Initiative
grant. We are very grateful to GISAID and all the groups
who shared their sequencing data
Conserved novel ORFs in the mitochondrial genome of the ctenophore Beroe forskalii
To date, five ctenophore species’ mitochondrial genomes have been sequenced, and each contains open reading frames (ORFs) that if translated have no identifiable orthologs. ORFs with no identifiable orthologs are called unidentified reading frames (URFs). If truly protein-coding, ctenophore mitochondrial URFs represent a little understood path in early-diverging metazoan mitochondrial evolution and metabolism. We sequenced and annotated the mitochondrial genomes of three individuals of the beroid ctenophore Beroe forskalii and found that in addition to sharing the same canonical mitochondrial genes as other ctenophores, the B. forskalii mitochondrial genome contains two URFs. These URFs are conserved among the three individuals but not found in other sequenced species. We developed computational tools called pauvre and cuttlery to determine the likelihood that URFs are protein coding. There is evidence that the two URFs are under negative selection, and a novel Bayesian hypothesis test of trinucleotide frequency shows that the URFs are more similar to known coding genes than noncoding intergenic sequence. Protein structure and function prediction of all ctenophore URFs suggests that they all code for transmembrane transport proteins. These findings, along with the presence of URFs in other sequenced ctenophore mitochondrial genomes, suggest that ctenophores may have uncharacterized transmembrane proteins present in their mitochondria
- …