71,852 research outputs found
A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction
While linear mixed model (LMM) has shown a competitive performance in
correcting spurious associations raised by population stratification, family
structures, and cryptic relatedness, more challenges are still to be addressed
regarding the complex structure of genotypic and phenotypic data. For example,
geneticists have discovered that some clusters of phenotypes are more
co-expressed than others. Hence, a joint analysis that can utilize such
relatedness information in a heterogeneous data set is crucial for genetic
modeling.
We proposed the sparse graph-structured linear mixed model (sGLMM) that can
incorporate the relatedness information from traits in a dataset with
confounding correction. Our method is capable of uncovering the genetic
associations of a large number of phenotypes together while considering the
relatedness of these phenotypes. Through extensive simulation experiments, we
show that the proposed model outperforms other existing approaches and can
model correlation from both population structure and shared signals. Further,
we validate the effectiveness of sGLMM in the real-world genomic dataset on two
different species from plants and humans. In Arabidopsis thaliana data, sGLMM
behaves better than all other baseline models for 63.4% traits. We also discuss
the potential causal genetic variation of Human Alzheimer's disease discovered
by our model and justify some of the most important genetic loci.Comment: Code available at https://github.com/YeWenting/sGLM
Forward-time simulation of realistic samples for genome-wide association studies
<p>Abstract</p> <p>Background</p> <p>Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods.</p> <p>Results</p> <p>Using a general-purpose forward-time population genetics simulation environment, we developed a forward-time simulation method that can be used to simulate realistic samples for genome-wide association studies. We examined the properties of this simulation method by comparing simulated samples with real data and demonstrated its wide applicability using four examples, including a simulation of case-control samples with a disease caused by multiple interacting genetic and environmental factors, a simulation of trio families affected by a disease-predisposing allele that had been subjected to either slow or rapid selective sweep, and a simulation of a structured population resulting from recent population admixture.</p> <p>Conclusions</p> <p>Our algorithm simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. Because of its flexibility to generate different types of samples with arbitrary disease or quantitative trait models, this simulation method can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for genome-wide association studies.</p
Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in Arabidopsis thaliana.
BackgroundGenome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.Methodology/principal findingsTo understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.Conclusions/significanceTogether, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse
Population Structure and Cryptic Relatedness in Genetic Association Studies
We review the problem of confounding in genetic association studies, which
arises principally because of population structure and cryptic relatedness.
Many treatments of the problem consider only a simple ``island'' model of
population structure. We take a broader approach, which views population
structure and cryptic relatedness as different aspects of a single confounder:
the unobserved pedigree defining the (often distant) relationships among the
study subjects. Kinship is therefore a central concept, and we review methods
of defining and estimating kinship coefficients, both pedigree-based and
marker-based. In this unified framework we review solutions to the problem of
population structure, including family-based study designs, genomic control,
structured association, regression control, principal components adjustment and
linear mixed models. The last solution makes the most explicit use of the
kinships among the study subjects, and has an established role in the analysis
of animal and plant breeding studies. Recent computational developments mean
that analyses of human genetic association data are beginning to benefit from
its powerful tests for association, which protect against population structure
and cryptic kinship, as well as intermediate levels of confounding by the
pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Population Genetic Signature of Polygenic Local Adaptation
Adaptation in response to selection on polygenic phenotypes may occur via
subtle allele frequencies shifts at many loci. Current population genomic
techniques are not well posed to identify such signals. In the past decade,
detailed knowledge about the specific loci underlying polygenic traits has
begun to emerge from genome-wide association studies (GWAS). Here we combine
this knowledge from GWAS with robust population genetic modeling to identify
traits that may have been influenced by local adaptation. We exploit the fact
that GWAS provide an estimate of the additive effect size of many loci to
estimate the mean additive genetic value for a given phenotype across many
populations as simple weighted sums of allele frequencies. We first describe a
general model of neutral genetic value drift for an arbitrary number of
populations with an arbitrary relatedness structure. Based on this model we
develop methods for detecting unusually strong correlations between genetic
values and specific environmental variables, as well as a generalization of
comparisons to test for over-dispersion of genetic values among
populations. Finally we lay out a framework to identify the individual
populations or groups of populations that contribute to the signal of
overdispersion. These tests have considerably greater power than their single
locus equivalents due to the fact that they look for positive covariance
between like effect alleles, and also significantly outperform methods that do
not account for population structure. We apply our tests to the Human Genome
Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation,
type 2 diabetes, body mass index, and two inflammatory bowel disease datasets.
This analysis uncovers a number of putative signals of local adaptation, and we
discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and
tables not included on this upload, but are mostly unchanged from v
Discriminant analysis of principal components and pedigree assessment of genetic diversity and population structure in a tetraploid potato panel using SNPs
The reported narrow genetic base of cultivated potato (Solanum tuberosum) can be expanded by the introgression of many related species with large genetic diversity. The analysis of the genetic structure of a potato population is important to broaden the genetic base of breeding programs by the identification of different genetic pools. A panel composed by 231 diverse genotypes was characterized using single nucleotide polymorphism (SNP) markers of the Illumina Infinium Potato SNP Array V2 to identify population structure and assess genetic diversity using discriminant analysis of principal components (DAPC) and pedigree analysis. Results revealed the presence of five clusters within the populations differentiated principally by ploidy, taxonomy, origin and breeding program. The information obtained in this work could be readily used as a guide for parental introduction in new breeding programs that want to maximize variability by combination of contrasting variability sources such as those presented here.Fil: Deperi, Sofía Irene. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata; ArgentinaFil: Tagliotti, Martin Enrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata; ArgentinaFil: Bedogni, María Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro Regional Buenos Aires Sur. Estación Experimental Agropecuaria Balcarce; ArgentinaFil: Manrique Carpintero, Norma C.. Michigan State University; Estados UnidosFil: Coombs, Joseph. Michigan State University; Estados UnidosFil: Zhang, Ruofang. Inner Mongolia University; ChinaFil: Douches, David. Michigan State University; Estados UnidosFil: Huarte, Marcelo Atilio. Instituto Nacional de Tecnología Agropecuaria. Centro Regional Buenos Aires Sur. Estación Experimental Agropecuaria Balcarce; Argentin
Patterns of genetic diversity and linkage disequilibrium in a highly structured Hordeum vulgare association-mapping population for the Mediterranean basin
Population structure and genome-wide linkage disequilibrium (LD) were investigated in 192 Hordeum vulgare accessions providing a comprehensive coverage of past and present barley breeding in the Mediterranean basin, using 50 nuclear microsatellite and 1,130 DArT® markers. Both clustering and principal coordinate analyses clearly sub-divided the sample into five distinct groups centred on key ancestors and regions of origin of the germplasm. For given genetic distances, large variation in LD values was observed, ranging from closely linked markers completely at equilibrium to marker pairs at 50 cM separation still showing significant LD. Mean LD values across the whole population sample decayed below r 2 of 0.15 after 3.2 cM. By assaying 1,130 genome-wide DArT® markers, we demonstrated that, after accounting for population substructure, current genome coverage of 1 marker per 1.5 cM except for chromosome 4H with 1 marker per 3.62 cM is sufficient for whole genome association scans. We show, by identifying associations with powdery mildew that map in genomic regions known to have resistance loci, that associations can be detected in strongly stratified samples provided population structure is effectively controlled in the analysis. The population we describe is, therefore, shown to be a valuable resource, which can be used in basic and applied research in barle
- …