8,334 research outputs found

    Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa

    Get PDF
    This is the publisher’s final pdf. The article is copyrighted by the New Phytologist Trust and published by John Wiley & Sons, Inc. It can be found at: http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291469-8137. To the best of our knowledge, one or more authors of this paper were federal employees when contributing to this work.•Plant population genomics informs evolutionary biology, breeding, conservation and bioenergy feedstock development. For example, the detection of reliable phenotype–genotype associations and molecular signatures of selection requires a detailed knowledge about genome-wide patterns of allele frequency variation, linkage disequilibrium and recombination.\ud •We resequenced 16 genomes of the model tree Populus trichocarpa and genotyped 120 trees from 10 subpopulations using 29 213 single-nucleotide polymorphisms.\ud •Significant geographic differentiation was present at multiple spatial scales, and range-wide latitudinal allele frequency gradients were strikingly common across the genome. The decay of linkage disequilibrium with physical distance was slower than expected from previous studies in Populus, with r² dropping below 0.2 within 3–6 kb. Consistent with this, estimates of recent effective population size from linkage disequilibrium (N[subscript e] ≈ 4000–6000) were remarkably low relative to the large census sizes of P. trichocarpa stands. Fine-scale rates of recombination varied widely across the genome, but were largely predictable on the basis of DNA sequence and methylation features.\ud •Our results suggest that genetic drift has played a significant role in the recent evolutionary history of P. trichocarpa. Most importantly, the extensive linkage disequilibrium detected suggests that genome-wide association studies and genomic selection in undomesticated populations may be more feasible in Populus than previously assumed

    Two-Locus Likelihoods under Variable Population Size and Fine-Scale Recombination Rate Estimation

    Full text link
    Two-locus sampling probabilities have played a central role in devising an efficient composite likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here new methods to compute the sampling probability for variable population size functions that are piecewise constant. Our main theoretical result, implemented in a new software package called LDpop, is a novel formula for the sampling probability that can be evaluated by numerically exponentiating a large but sparse matrix. This formula can handle moderate sample sizes (n50n \leq 50) and demographic size histories with a large number of epochs (D64\mathcal{D} \geq 64). In addition, LDpop implements an approximate formula for the sampling probability that is reasonably accurate and scales to hundreds in sample size (n256n \geq 256). Finally, LDpop includes an importance sampler for the posterior distribution of two-locus genealogies, based on a new result for the optimal proposal distribution in the variable-size setting. Using our methods, we study how a sharp population bottleneck followed by rapid growth affects the correlation between partially linked sites. Then, through an extensive simulation study, we show that accounting for population size changes under such a demographic model leads to substantial improvements in fine-scale recombination rate estimation. LDpop is freely available for download at https://github.com/popgenmethods/ldpopComment: 32 pages, 13 figure

    Coalescence 2.0: a multiple branching of recent theoretical developments and their applications

    Get PDF
    Population genetics theory has laid the foundations for genomics analyses including the recent burst in genome scans for selection and statistical inference of past demographic events in many prokaryote, animal and plant species. Identifying SNPs under natural selection and underpinning species adaptation relies on disentangling the respective contribution of random processes (mutation, drift, migration) from that of selection on nucleotide variability. Most theory and statistical tests have been developed using the Kingman coalescent theory based on the Wright-Fisher population model. However, these theoretical models rely on biological and life-history assumptions which may be violated in many prokaryote, fungal, animal or plant species. Recent theoretical developments of the so called multiple merger coalescent models are reviewed here ({\Lambda}-coalescent, beta-coalescent, Bolthausen-Snitzman, {\Xi}-coalescent). We explicit how these new models take into account various pervasive ecological and biological characteristics, life history traits or life cycles which were not accounted in previous theories such as 1) the skew in offspring production typical of marine species, 2) fast adapting microparasites (virus, bacteria and fungi) exhibiting large variation in population sizes during epidemics, 3) the peculiar life cycles of fungi and bacteria alternating sexual and asexual cycles, and 4) the high rates of extinction-recolonization in spatially structured populations. We finally discuss the relevance of multiple merger models for the detection of SNPs under selection in these species, for population genomics of very large sample size and advocate to potentially examine the conclusion of previous population genetics studies.Comment: 3 Figure

    Association mapping in tetraploid potato

    Get PDF
    The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency. In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes. In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits. These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well. To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context. In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning. In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed. <br/

    Tractable diffusion and coalescent processes for weakly correlated loci

    Get PDF
    Widely used models in genetics include the Wright-Fisher diffusion and its moment dual, Kingman's coalescent. Each has a multilocus extension but under neither extension is the sampling distribution available in closed-form, and their computation is extremely difficult. In this paper we derive two new multilocus population genetic models, one a diffusion and the other a coalescent process, which are much simpler than the standard models, but which capture their key properties for large recombination rates. The diffusion model is based on a central limit theorem for density dependent population processes, and we show that the sampling distribution is a linear combination of moments of Gaussian distributions and hence available in closed-form. The coalescent process is based on a probabilistic coupling of the ancestral recombination graph to a simpler genealogical process which exposes the leading dynamics of the former. We further demonstrate that when we consider the sampling distribution as an asymptotic expansion in inverse powers of the recombination parameter, the sampling distributions of the new models agree with the standard ones up to the first two orders.Comment: 34 pages, 1 figur

    netgwas: An R Package for Network-Based Genome-Wide Association Studies

    Full text link
    Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil
    corecore