8,334 research outputs found
Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa
This is the publisher’s final pdf. The article is copyrighted by the New Phytologist Trust and published by John Wiley & Sons, Inc. It can be found at: http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291469-8137. To the best of our knowledge, one or more authors of this paper were federal employees when contributing to this work.•Plant population genomics informs evolutionary biology, breeding, conservation and bioenergy feedstock development. For example, the detection of reliable phenotype–genotype associations and molecular signatures of selection requires a detailed knowledge about genome-wide patterns of allele frequency variation, linkage disequilibrium and recombination.\ud
•We resequenced 16 genomes of the model tree Populus trichocarpa and genotyped 120 trees from 10 subpopulations using 29 213 single-nucleotide polymorphisms.\ud
•Significant geographic differentiation was present at multiple spatial scales, and range-wide latitudinal allele frequency gradients were strikingly common across the genome. The decay of linkage disequilibrium with physical distance was slower than expected from previous studies in Populus, with r² dropping below 0.2 within 3–6 kb. Consistent with this, estimates of recent effective population size from linkage disequilibrium (N[subscript e] ≈ 4000–6000) were remarkably low relative to the large census sizes of P. trichocarpa stands. Fine-scale rates of recombination varied widely across the genome, but were largely predictable on the basis of DNA sequence and methylation features.\ud
•Our results suggest that genetic drift has played a significant role in the recent evolutionary history of P. trichocarpa. Most importantly, the extensive linkage disequilibrium detected suggests that genome-wide association studies and genomic selection in undomesticated populations may be more feasible in Populus than previously assumed
Two-Locus Likelihoods under Variable Population Size and Fine-Scale Recombination Rate Estimation
Two-locus sampling probabilities have played a central role in devising an
efficient composite likelihood method for estimating fine-scale recombination
rates. Due to mathematical and computational challenges, these sampling
probabilities are typically computed under the unrealistic assumption of a
constant population size, and simulation studies have shown that resulting
recombination rate estimates can be severely biased in certain cases of
historical population size changes. To alleviate this problem, we develop here
new methods to compute the sampling probability for variable population size
functions that are piecewise constant. Our main theoretical result, implemented
in a new software package called LDpop, is a novel formula for the sampling
probability that can be evaluated by numerically exponentiating a large but
sparse matrix. This formula can handle moderate sample sizes () and
demographic size histories with a large number of epochs (). In addition, LDpop implements an approximate formula for the sampling
probability that is reasonably accurate and scales to hundreds in sample size
(). Finally, LDpop includes an importance sampler for the posterior
distribution of two-locus genealogies, based on a new result for the optimal
proposal distribution in the variable-size setting. Using our methods, we study
how a sharp population bottleneck followed by rapid growth affects the
correlation between partially linked sites. Then, through an extensive
simulation study, we show that accounting for population size changes under
such a demographic model leads to substantial improvements in fine-scale
recombination rate estimation. LDpop is freely available for download at
https://github.com/popgenmethods/ldpopComment: 32 pages, 13 figure
Coalescence 2.0: a multiple branching of recent theoretical developments and their applications
Population genetics theory has laid the foundations for genomics analyses
including the recent burst in genome scans for selection and statistical
inference of past demographic events in many prokaryote, animal and plant
species. Identifying SNPs under natural selection and underpinning species
adaptation relies on disentangling the respective contribution of random
processes (mutation, drift, migration) from that of selection on nucleotide
variability. Most theory and statistical tests have been developed using the
Kingman coalescent theory based on the Wright-Fisher population model. However,
these theoretical models rely on biological and life-history assumptions which
may be violated in many prokaryote, fungal, animal or plant species. Recent
theoretical developments of the so called multiple merger coalescent models are
reviewed here ({\Lambda}-coalescent, beta-coalescent, Bolthausen-Snitzman,
{\Xi}-coalescent). We explicit how these new models take into account various
pervasive ecological and biological characteristics, life history traits or
life cycles which were not accounted in previous theories such as 1) the skew
in offspring production typical of marine species, 2) fast adapting
microparasites (virus, bacteria and fungi) exhibiting large variation in
population sizes during epidemics, 3) the peculiar life cycles of fungi and
bacteria alternating sexual and asexual cycles, and 4) the high rates of
extinction-recolonization in spatially structured populations. We finally
discuss the relevance of multiple merger models for the detection of SNPs under
selection in these species, for population genomics of very large sample size
and advocate to potentially examine the conclusion of previous population
genetics studies.Comment: 3 Figure
Association mapping in tetraploid potato
The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency. In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes. In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits. These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well. To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context. In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning. In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed. <br/
Tractable diffusion and coalescent processes for weakly correlated loci
Widely used models in genetics include the Wright-Fisher diffusion and its
moment dual, Kingman's coalescent. Each has a multilocus extension but under
neither extension is the sampling distribution available in closed-form, and
their computation is extremely difficult. In this paper we derive two new
multilocus population genetic models, one a diffusion and the other a
coalescent process, which are much simpler than the standard models, but which
capture their key properties for large recombination rates. The diffusion model
is based on a central limit theorem for density dependent population processes,
and we show that the sampling distribution is a linear combination of moments
of Gaussian distributions and hence available in closed-form. The coalescent
process is based on a probabilistic coupling of the ancestral recombination
graph to a simpler genealogical process which exposes the leading dynamics of
the former. We further demonstrate that when we consider the sampling
distribution as an asymptotic expansion in inverse powers of the recombination
parameter, the sampling distributions of the new models agree with the standard
ones up to the first two orders.Comment: 34 pages, 1 figur
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
Recommended from our members
Predictive impact of rare genomic copy number variations in siblings of individuals with autism spectrum disorders.
Identification of genetic biomarkers associated with autism spectrum disorders (ASDs) could improve recurrence prediction for families with a child with ASD. Here, we describe clinical microarray findings for 253 longitudinally phenotyped ASD families from the Baby Siblings Research Consortium (BSRC), encompassing 288 infant siblings. By age 3, 103 siblings (35.8%) were diagnosed with ASD and 54 (18.8%) were developing atypically. Thirteen siblings have copy number variants (CNVs) involving ASD-relevant genes: 6 with ASD, 5 atypically developing, and 2 typically developing. Within these families, an ASD-related CNV in a sibling has a positive predictive value (PPV) for ASD or atypical development of 0.83; the Simons Simplex Collection of ASD families shows similar PPVs. Polygenic risk analyses suggest that common genetic variants may also contribute to ASD. CNV findings would have been pre-symptomatically predictive of ASD or atypical development in 11 (7%) of the 157 BSRC siblings who were eventually diagnosed clinically
- …