1,483 research outputs found
Recommended from our members
Functional variants of DOG1 control seed chilling responses and variation in seasonal life-history strategies in Arabidopsis thaliana.
The seasonal timing of seed germination determines a plant's realized environmental niche, and is important for adaptation to climate. The timing of seasonal germination depends on patterns of seed dormancy release or induction by cold and interacts with flowering-time variation to construct different seasonal life histories. To characterize the genetic basis and climatic associations of natural variation in seed chilling responses and associated life-history syndromes, we selected 559 fully sequenced accessions of the model annual species Arabidopsis thaliana from across a wide climate range and scored each for seed germination across a range of 13 cold stratification treatments, as well as the timing of flowering and senescence. Germination strategies varied continuously along 2 major axes: 1) Overall germination fraction and 2) induction vs. release of dormancy by cold. Natural variation in seed responses to chilling was correlated with flowering time and senescence to create a range of seasonal life-history syndromes. Genome-wide association identified several loci associated with natural variation in seed chilling responses, including a known functional polymorphism in the self-binding domain of the candidate gene DOG1. A phylogeny of DOG1 haplotypes revealed ancient divergence of these functional variants associated with periods of Pleistocene climate change, and Gradient Forest analysis showed that allele turnover of candidate SNPs was significantly associated with climate gradients. These results provide evidence that A. thaliana's germination niche and correlated life-history syndromes are shaped by past climate cycles, as well as local adaptation to contemporary climate
Strong signature of natural selection within an FHIT intron implicated in prostate cancer risk
Previously, a candidate gene linkage approach on brother pairs affected with prostate cancer identified a locus of prostate cancer susceptibility at D3S1234 within the fragile histidine triad gene (FHIT), a tumor suppressor that induces apoptosis. Subsequent association tests on 16 SNPs spanning approximately 381 kb surrounding D3S1234 in Americans of European descent revealed significant evidence of association for a single SNP within intron 5 of FHIT. In the current study, resequencing and genotyping within a 28.5 kb region surrounding this SNP further delineated the association with prostate cancer risk to a 15 kb region. Multiple SNPs in sequences under evolutionary constraint within intron 5 of FHIT defined several related haplotypes with an increased risk of prostate cancer in European-Americans. Strong associations were detected for a risk haplotype defined by SNPs 138543, 142413, and 152494 in all cases (Pearson's χ2 = 12.34, df 1, P = 0.00045) and for the homozygous risk haplotype defined by SNPs 144716, 142413, and 148444 in cases that shared 2 alleles identical by descent with their affected brothers (Pearson's χ2 = 11.50, df 1, P = 0.00070). In addition to highly conserved sequences encompassing SNPs 148444 and 152413, population studies revealed strong signatures of natural selection for a 1 kb window covering the SNP 144716 in two human populations, the European American (π = 0.0072, Tajima's D= 3.31, 14 SNPs) and the Japanese (π = 0.0049, Fay & Wu's H = 8.05, 14 SNPs), as well as in chimpanzees (Fay & Wu's H = 8.62, 12 SNPs). These results strongly support the involvement of the FHIT intronic region in an increased risk of prostate cancer. © 2008 Ding et al
A nonparametric HMM for genetic imputation and coalescent inference
Genetic sequence data are well described by hidden Markov models (HMMs) in
which latent states correspond to clusters of similar mutation patterns. Theory
from statistical genetics suggests that these HMMs are nonhomogeneous (their
transition probabilities vary along the chromosome) and have large support for
self transitions. We develop a new nonparametric model of genetic sequence
data, based on the hierarchical Dirichlet process, which supports these self
transitions and nonhomogeneity. Our model provides a parameterization of the
genetic process that is more parsimonious than other more general nonparametric
models which have previously been applied to population genetics. We provide
truncation-free MCMC inference for our model using a new auxiliary sampling
scheme for Bayesian nonparametric HMMs. In a series of experiments on male X
chromosome data from the Thousand Genomes Project and also on data simulated
from a population bottleneck we show the benefits of our model over the popular
finite model fastPHASE, which can itself be seen as a parametric truncation of
our model. We find that the number of HMM states found by our model is
correlated with the time to the most recent common ancestor in population
bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics
applied to large and complex genetic data
Second-generation PLINK: rising to the challenge of larger and richer datasets
PLINK 1 is a widely used open-source C/C++ toolset for genome-wide
association studies (GWAS) and research in population genetics. However, the
steady accumulation of data from imputation and whole-genome sequencing studies
has exposed a strong need for even faster and more scalable implementations of
key functions. In addition, GWAS and population-genetic data now frequently
contain probabilistic calls, phase information, and/or multiallelic variants,
none of which can be represented by PLINK 1's primary data format.
To address these issues, we are developing a second-generation codebase for
PLINK. The first major release from this codebase, PLINK 1.9, introduces
extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space
Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic
improvements. In combination, these changes accelerate most operations by 1-4
orders of magnitude, and allow the program to handle datasets too large to fit
in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data
format capable of efficiently representing probabilities, phase, and
multiallelic variants, and (b) extensions of many functions to account for the
new types of information.
The second-generation versions of PLINK will offer dramatic improvements in
performance and compatibility. For the first time, users without access to
high-end computing resources can perform several essential analyses of the
feature-rich and very large genetic datasets coming into use.Comment: 2 figures, 1 additional fil
Assessing the Performance of the Haplotype Block Model of Linkage Disequilibrium
Several recent studies have suggested that linkage disequilibrium (LD) in the human genome has a fundamentally “blocklike” structure. However, thus far there has been little formal assessment of how well the haplotype block model captures the underlying structure of LD. Here we propose quantitative criteria for assessing how blocklike LD is and apply these criteria to both real and simulated data. Analyses of several large data sets indicate that real data show a partial fit to the haplotype block model; some regions conform quite well, whereas others do not. Some improvement could be obtained by genotyping higher marker densities but not by increasing the number of samples. Nonetheless, although the real data are only moderately blocklike, our simulations indicate that, under a model of uniform recombination, the structure of LD would actually fit the block model much less well. Simulations of a model in which much of the recombination occurs in narrow hotspots provide a much better fit to the observed patterns of LD, suggesting that there is extensive fine-scale variation in recombination rates across the human genome
- …