22 research outputs found

    Mixed linear model approach adapted for genome-wide association studies.

    Get PDF
    5 5 t e c h n i c a l r e p o r t s Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL. Although genome-wide association studies (GWAS) have the potential to pinpoint genetic polymorphisms underlying human diseases and agriculturally important traits, false discoveries are a major concern 1 and can be partially attributed to spurious associations caused by population structure and unequal relatedness among individuals in a given cohort. Population stratification was initially addressed using general linear model (GLM)-based methods such as structured association 2 , genomic control 3 and family-based tests of association 4 . The introduction of MLM approaches has more recently been demonstrated as an improved method to simultaneously account for population structure and unequal relatedness among individuals 5 . In the MLM-based methods, population structure 2,6 is fit as a fixed effect, whereas kinship among individuals is incorporated as the variance-covariance structure of the random effect for the individuals. Regardless of the applied statistical method, GWAS require large sample sizes to achieve sufficient statistical power 7 , especially in order to detect the small effect polymorphisms that underlie most complex traits 8 . For the MLM approach, datasets with these large sample sizes create a heavy computational burden because the computing time for solving a MLM increases with the cube of the number of individuals fit as a random effect. The earliest effort to reduce the size of the random effect in an MLM can be traced back to the sire model approach used in animal breeding 9-12 , which replaces an individual's genetic effect with that of its sire. Consequently, the sire-model approach requires pedigrees, which are not always available, and which in particular are often not available in plant studies. Even with available pedigrees, the use of a marker-based kinship is preferred because of its higher accurac

    Heterosis Is Prevalent for Multiple Traits in Diverse Maize Germplasm

    Get PDF
    BACKGROUND: Heterosis describes the superior phenotypes observed in hybrids relative to their inbred parents. Maize is a model system for studying heterosis due to the high levels of yield heterosis and commercial use of hybrids. METHODS: The inbred lines from an association mapping panel were crossed to a common inbred line, B73, to generate nearly 300 hybrid genotypes. Heterosis was evaluated for seventeen phenotypic traits in multiple environments. The majority of hybrids exhibit better-parent heterosis in most of the hybrids measured. Correlations between the levels of heterosis for different traits were generally weak, suggesting that the genetic basis of heterosis is trait-dependent. CONCLUSIONS: The ability to predict heterosis levels using inbred phenotype or genetic distance between the parents varied for the different traits. For some traits it is possible to explain a significant proportion of the heterosis variation using linear modeling while other traits are more difficult to predict

    Joint QTL Linkage Mapping for Multiple-Cross Mating Design Sharing One Common Parent

    Get PDF
    BACKGROUND: Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present the detailed algorithm of a QTL linkage mapping method suitable for genetic populations derived from NAM designs. This method is called joint inclusive composite interval mapping (JICIM). Simulations were designed on the detected QTL in a maize NAM population and an Arabidopsis NAM population so as to evaluate the efficiency of the NAM design and the JICIM method. PRINCIPAL FINDINGS: Fifty-two QTL were identified in the maize population, explaining 89% of the phenotypic variance of days to silking, and nine QTL were identified in the Arabidopsis population, explaining 83% of the phenotypic variance of flowering time. Simulations indicated that the detection power of these identified QTL was consistently high, especially for large-effect QTL. For rare QTL having significant effects in only one family, the power of correct detection within the 5 cM support interval was around 80% for 1-day effect QTL in the maize population, and for 3-day effect QTL in the Arabidopsis population. For smaller-effect QTL, the power diminished, e.g., it was around 50% for maize QTL with an effect of 0.5 day. When QTL were linked at a distance of 5 cM, the likelihood of mapping them as two distinct QTL was about 70% in the maize population. When the linkage distance was 1 cM, they were more likely mapped as one single QTL at an intermediary position. CONCLUSIONS: Because it takes advantage of the large genetic variation among parental lines and the large population size, NAM is a powerful multiple-cross design for complex trait dissection. JICIM is an efficient and specialty method for the joint QTL linkage mapping of genetic populations derived from the NAM design

    Evolution of Disease Response Genes in Loblolly Pine: Insights from Candidate Genes

    Get PDF
    BACKGROUND: Host-pathogen interactions that may lead to a competitive co-evolution of virulence and resistance mechanisms present an attractive system to study molecular evolution because strong, recent (or even current) selective pressure is expected at many genomic loci. However, it is unclear whether these selective forces would act to preserve existing diversity, promote novel diversity, or reduce linked neutral diversity during rapid fixation of advantageous alleles. In plants, the lack of adaptive immunity places a larger burden on genetic diversity to ensure survival of plant populations. This burden is even greater if the generation time of the plant is much longer than the generation time of the pathogen. METHODOLOGY/PRINCIPAL FINDINGS: Here, we present nucleotide polymorphism and substitution data for 41 candidate genes from the long-lived forest tree loblolly pine, selected primarily for their prospective influences on host-pathogen interactions. This dataset is analyzed together with 15 drought-tolerance and 13 wood-quality genes from previous studies. A wide range of neutrality tests were performed and tested against expectations from realistic demographic models. CONCLUSIONS/SIGNIFICANCE: Collectively, our analyses found that axr (auxin response factor), caf1 (chromatin assembly factor) and gatabp1 (gata binding protein 1) candidate genes carry patterns consistent with directional selection and erd3 (early response to drought 3) displays patterns suggestive of a selective sweep, both of which are consistent with the arm-race model of disease response evolution. Furthermore, we have identified patterns consistent with diversifying selection at erf1-like (ethylene responsive factor 1), ccoaoemt (caffeoyl-CoA-O-methyltransferase), cyp450-like (cytochrome p450-like) and pr4.3 (pathogen response 4.3), expected under the trench-warfare evolution model. Finally, a drought-tolerance candidate related to the plant cell wall, lp5, displayed patterns consistent with balancing selection. In conclusion, both arms-race and trench-warfare models seem compatible with patterns of polymorphism found in different disease-response candidate genes, indicating a mixed strategy of disease tolerance evolution for loblolly pine, a major tree crop in southeastern United States

    DNA Sequence Variation and Selection of Tag Single-Nucleotide Polymorphisms at Candidate Genes for Drought-Stress Response in Pinus taeda L.

    No full text
    Genetic association studies are rapidly becoming the experimental approach of choice to dissect complex traits, including tolerance to drought stress, which is the most common cause of mortality and yield losses in forest trees. Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium and the selection of suitable polymorphisms for genotyping. Moreover, standard neutrality tests applied to DNA sequence variation data can be used to select candidate genes or amino acid sites that are putatively under selection for association mapping. In this article, we study the pattern of polymorphism of 18 candidate genes for drought-stress response in Pinus taeda L., an important tree crop. Data analyses based on a set of 21 putatively neutral nuclear microsatellites did not show population genetic structure or genomewide departures from neutrality. Candidate genes had moderate average nucleotide diversity at silent sites (π(sil) = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r(2) of 0.30, decaying rapidly from ∼0.50 to ∼0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of ∼30–40%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine

    Association Genetics in Pinus taeda L. I. Wood Property Traits

    No full text
    Genetic association is a powerful method for dissecting complex adaptive traits due to (i) fine-scale mapping resulting from historical recombination, (ii) wide coverage of phenotypic and genotypic variation within a single experiment, and (iii) the simultaneous discovery of loci and alleles. In this article, genetic association among single nucleotide polymorphisms (58 SNPs) from 20 wood- and drought-related candidate genes and an array of wood property traits with evolutionary and commercial importance, namely, earlywood and latewood specific gravity, percentage of latewood, earlywood microfibril angle, and wood chemistry (lignin and cellulose content), was tested using mixed linear models (MLMs) that account for relatedness among individuals by using a pairwise kinship matrix. Population structure, a common systematic bias in association studies, was assessed using 22 nuclear microsatellites. Different phenotype:genotype associations were found, some of them confirming previous evidence from collocation of QTL and genes in linkage maps (for example, 4cl and percentage of latewood) and two that involve nonsynonymous polymorphisms (cad SNP M28 with earlywood specific gravity and 4cl SNP M7 with percentage of latewood). The strongest genetic association found in this study was between allelic variation in α-tubulin, a gene involved in the formation of cortical microtubules, and earlywood microfibril angle. Intragenic LD decays rapidly in conifers; thus SNPs showing genetic association are likely to be located in close proximity to the causative polymorphisms. This first multigene association genetic study in forest trees has shown the feasibility of candidate gene strategies for dissecting complex adaptive traits, provided that genes belonging to key pathways and appropriate statistical tools are used. This approach is of particular utility in species such as conifers, where genomewide strategies are limited by their large genomes

    Association Mapping: Critical Considerations Shift from Genotyping to Experimental Design

    No full text
    The goal of many plant scientists' research is to explain natural phenotypic variation in terms of simple changes in DNA sequence. Traditionally, linkage mapping has been the most commonly employed method to reach this goal: experimental crosses are made to generate a family with known relatedness, and attempts are made to identify cosegregation of genetic markers and phenotypes within this family. In vertebrate systems, association mapping (also known as linkage disequilibrium mapping) is increasingly being adopted as the mapping method of choice. Association mapping involves searching for genotype-phenotype correlations in unrelated individuals and often is more rapid and cost-effective than traditional linkage mapping. We emphasize here that linkage and association mapping are complementary approaches and are more similar than is often assumed. Unlike in vertebrates, where controlled crosses can be expensive or impossible (e.g., in humans), the plant scientific community can exploit the advantages of both controlled crosses and association mapping to increase statistical power and mapping resolution. While the time and money required for the collection of genotype data were critical considerations in the past, the increasing availability of inexpensive DNA sequencing and genotyping methods should prompt researchers to shift their attention to experimental design. This review provides thoughts on finding the optimal experimental mix of association mapping using unrelated individuals and controlled crosses to identify the genes underlying phenotypic variation

    SNP discovery with EST and NextGen sequencing in switchgrass (Panicum virgatum L.).

    Get PDF
    Although yield trials for switchgrass (Panicum virgatum L.), a potentially high value biofuel feedstock crop, are currently underway throughout North America, the genetic tools for crop improvement in this species are still in the early stages of development. Identification of high-density molecular markers, such as single nucleotide polymorphisms (SNPs), that are amenable to high-throughput genotyping approaches, is the first step in a quantitative genetics study of this model biofuel crop species. We generated and sequenced expressed sequence tag (EST) libraries from thirteen diverse switchgrass cultivars representing both upland and lowland ecotypes, as well as tetraploid and octoploid genomes. We followed this with reduced genomic library preparation and massively parallel sequencing of the same samples using the Illumina Genome Analyzer technology platform. EST libraries were used to generate unigene clusters and establish a gene-space reference sequence, thus providing a framework for assembly of the short sequence reads. SNPs were identified utilizing these scaffolds. We used a custom software program for alignment and SNP detection and identified over 149,000 SNPs across the 13 short-read sequencing libraries (SRSLs). Approximately 25,000 additional SNPs were identified from the entire EST collection available for the species. This sequencing effort generated data that are suitable for marker development and for estimation of population genetic parameters, such as nucleotide diversity and linkage disequilibrium. Based on these data, we assessed the feasibility of genome wide association mapping and genomic selection applications in switchgrass. Overall, the SNP markers discovered in this study will help facilitate quantitative genetics experiments and greatly enhance breeding efforts that target improvement of key biofuel traits and development of new switchgrass cultivars
    corecore