49 research outputs found

    Linkage mapping reveals strong chiasma interference in Sockeye salmon: Implications for interpreting genomic data

    Get PDF
    Meiotic recombination is fundamental for generating new genetic variation and for securing proper disjunction. Further, recombination plays an essential role during the rediploidization process of polyploid-origin genomes because crossovers between pairs of homeologous chromosomes retain duplicated regions. A better understanding of how recombination affects genome evolution is crucial for interpreting genomic data; unfortunately, current knowledge mainly originates from a few model species. Salmonid fishes provide a valuable system for studying the effects of recombination in nonmodel species. Salmonid females generally produce thousands of embryos, providing large families for conducting inheritance studies. Further, salmonid genomes are currently rediploidizing after a whole genome duplication and can serve as models for studying the role of homeologous crossovers on genome evolution. Here, we present a detailed interrogation of recombination patterns in sockeye salmon (Oncorhynchus nerka). First, we use RAD sequencing of haploid and diploid gynogenetic families to construct a dense linkage map that includes paralogous loci and location of centromeres. We find a nonrandom distribution of paralogs that mainly cluster in extended regions distally located on 11 different chromosomes, consistent with ongoing homeologous recombination in these regions. We also estimate the strength of interference across each chromosome; results reveal strong interference and crossovers are mostly limited to one per arm. Interference was further shown to continue across centromeres, but metacentric chromosomes generally had at least one crossover on each arm. We discuss the relevance of these findings for both mapping and population genomic studies

    Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

    Get PDF
    Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

    Population genomics of marine zooplankton

    Get PDF
    Author Posting. © The Author(s), 2017. This is the author's version of the work. It is posted here for personal use, not for redistribution. The definitive version was published in Bucklin, Ann et al. "Population Genomics of Marine Zooplankton." Population Genomics: Marine Organisms. Ed. Om P. Rajora and Marjorie Oleksiak. Springer, 2018. doi:10.1007/13836_2017_9.The exceptionally large population size and cosmopolitan biogeographic distribution that distinguish many – but not all – marine zooplankton species generate similarly exceptional patterns of population genetic and genomic diversity and structure. The phylogenetic diversity of zooplankton has slowed the application of population genomic approaches, due to lack of genomic resources for closelyrelated species and diversity of genomic architecture, including highly-replicated genomes of many crustaceans. Use of numerous genomic markers, especially single nucleotide polymorphisms (SNPs), is transforming our ability to analyze population genetics and connectivity of marine zooplankton, and providing new understanding and different answers than earlier analyses, which typically used mitochondrial DNA and microsatellite markers. Population genomic approaches have confirmed that, despite high dispersal potential, many zooplankton species exhibit genetic structuring among geographic populations, especially at large ocean-basin scales, and have revealed patterns and pathways of population connectivity that do not always track ocean circulation. Genomic and transcriptomic resources are critically needed to allow further examination of micro-evolution and local adaptation, including identification of genes that show evidence of selection. These new tools will also enable further examination of the significance of small-scale genetic heterogeneity of marine zooplankton, to discriminate genetic “noise” in large and patchy populations from local adaptation to environmental conditions and change.Support was provided by the US National Science Foundation to AB and RJO (PLR-1044982) and to RJO (MCB-1613856); support to IS and MC was provided by Nord University (Norway)

    Inbreeding effective population size and parentage analysis without parents

    Get PDF
    An important use of genetic parentage analysis is the ability to directly calculate the number of offspring produced by each parent (ki) and hence effective population size, Ne. But what if parental genotypes are not available? In theory, given enough markers, it should be possible to reconstruct parental genotypes based entirely on a sample of progeny, and if so the vector of parental ki values. However, this would provide information only about parents that actually contributed offspring to the sample. How would ignoring the ‘null’ parents (those that produced no offspring) affect an estimate of Ne? The surprising answer is that null parents have no effect at all. We show that: (i) The standard formula for inbreeding Ne can be rewritten so that it is a function only of sample size and Σ (k2i); it is not necessary to know the total number of parents (N). This same relationship does not hold for variance Ne. (ii) This novel formula provides an unbiased estimate of Ne even if only a subset of progeny is available, provided the parental contributions are accurately determined, in which case precision is also high compared to other single-sample estimators of Ne. (iii) It is not necessary to actually reconstruct parental genotypes; from a matrix of pairwise relationships (as can be estimated by some current software programs), it is possible to construct the vector of ki values and estimate Ne. The new method based on parentage analysis without parents (PwoP) can potentially be useful as a single-sample estimator of contemporary Ne, provided that either (i) relationships can be accurately determined, or (ii) Σ (k2i) can be estimated directly

    Population genomics of Salish Sea chum salmon: The legacy of the salmonid whole genome duplication

    No full text
    Thesis (Master's)--University of Washington, 2015-12The common ancestor of salmonids underwent a whole genome duplication approximately 88 million years ago. This duplication event still has a lasting impact on the form and structure of salmon genomes today and is evident in many duplicated genes and ongoing residual tetrasomic inheritance. This duplication also serves to complicate genetic analyses, as paralogous genes and sequences are difficult to distinguish, and often fully excluded prior to study. The goal of this thesis is to demonstrate how to incorporate duplicated loci into genetic studies of salmonids using high-throughput sequencing of chum salmon from the Salish Sea. In the first chapter, I develop a method to resolve paralogous loci within a pedigree and include them on a high-density linkage map. I show that paralogous loci are concentrated in 16 regions near the ends of linkage groups. These regions are inferred to have ongoing residual tetrasomic inheritance and we find that they have a lower incidence of transposable elements than the rest of the genome, a possible explanation for their stability since the whole genome duplication. In the second chapter, I use the discovered paralogous loci in a population genetic study of 10 collections of chum salmon from the Salish Sea. I compare genetic diversity and population structure at paralogous and non-paralogous loci and conduct a genome scan for association with run timing. I demonstrate that it is possible to characterize paralogous loci in wild populations and that they show similar patterns of population structure as the rest of the genome. The genome scan reveals genomic regions of elevated association with run timing, highlighting the potential downside of excluding paralogous loci in studies looking for genetic signals of adaptation

    primer-based (taqMan) genotypes

    No full text
    This file contains genotype data generated by primer- based genotyping. Each row is an individual, labeled in the first column. Each column past the first gives genotypes for a specific locus, labeled in the first row. Only haploid offspring are included and they are expected to be homozygous. Alleles are specified by 'X' and 'Y', uncalled genoyypes are specified with 'no call' or 'invalid'. This is a text file with Windows line endings: "\r\n

    Pseudoreplication in genomics-scale data sets

    No full text
    In genomics-scale datasets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df’) compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here we measured pseudoreplication (quantified by the ratio df’/df) for a common metric of genetic differentiation (F(ST)) and a common measure of linkage disequilibrium between pairs of loci (r(2)). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df’ and df’/df by measuring the rate of decline in the variance of mean F(ST) and mean r(2) as more loci were used. For both indices, df’ increases with N(e) and genome size, as expected. However, even for large N(e) and large genomes, df’ for mean r(2) plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for F(ST), but df’/df ≀0.01 can occur in datasets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var(F(ST)), producing very conservative confidence intervals. Predicting df’ based on our modeling results as a function of N(e), L, S, and genome size provides a robust way to quantify precision associated with genomics-scale datasets

    Data from: Estimating contemporary effective population size in non-model species using linkage disequilibrium across thousands of loci

    No full text
    Contemporary effective population size (Ne) can be estimated using linkage disequilibrium (LD) observed across pairs of loci presumed to be selectively neutral and unlinked. This method has been commonly applied to data sets containing 10–100 loci to inform conservation and study population demography. Performance of these Ne estimates could be improved by incorporating data from thousands of loci. However, these thousands of loci exist on a limited number of chromosomes, ensuring that some fraction will be physically linked. Linked loci have elevated LD due to limited recombination, which if not accounted for can cause Ne estimates to be downwardly biased. Here, we present results from coalescent and forward simulations designed to evaluate the bias of LD-based Ne estimates (N circe). Contrary to common perceptions, increasing the number of loci does not increase the magnitude of linkage. Although we show it is possible to identify some pairs of loci that produce unusually large r2 values, simply removing large r2 values is not a reliable way to eliminate bias. Fortunately, the magnitude of bias in N circe is strongly and negatively correlated with the process of recombination, including the number of chromosomes and their length, and this relationship provides a general way to adjust for bias. Additionally, we show that with thousands of loci, precision of N circe is much lower than expected based on the assumption that each pair of loci provides completely independent information

    Data from: Congruent population structure across paralogous and non-paralogous loci in Salish Sea chum salmon (Oncorhynchus keta)

    No full text
    Whole genome duplications are major evolutionary events with a lasting impact on genome structure. Duplication events complicate genetic analyses as paralogous sequences are difficult to distinguish; consequently paralogs are often excluded from studies. The effects of an ancient whole genome duplication (approximately 88MYA) are still evident in salmonids through the persistence of numerous paralogous gene sequences and partial tetrasomic inheritance. We use restriction site-associated DNA sequencing (RADseq) on ten collections of chum salmon from the Salish Sea in the USA and Canada to investigate genetic diversity and population structure in both tetrasomic and re-diploidized regions of the genome. We use a pedigree and high-density linkage map to identify paralogous loci and to investigate genetic variation across the genome. By applying multivariate statistical methods, we show that it is possible to characterize paralogous genetic loci and that they display similar patterns of population structure as the diploidized portion of the genome. We find genetic associations with the adaptively important trait of run timing in both sets of loci. By including paralogous loci in genome scans, we can observe evolutionary signals in genomic regions that have routinely been excluded from population genetic studies in other polyploid-derived species
    corecore