39 research outputs found

    Estimating Effects and Making Predictions from Genome-Wide Marker Data

    Full text link
    In genome-wide association studies (GWAS), hundreds of thousands of genetic markers (SNPs) are tested for association with a trait or phenotype. Reported effects tend to be larger in magnitude than the true effects of these markers, the so-called ``winner's curse.'' We argue that the classical definition of unbiasedness is not useful in this context and propose to use a different definition of unbiasedness that is a property of the estimator we advocate. We suggest an integrated approach to the estimation of the SNP effects and to the prediction of trait values, treating SNP effects as random instead of fixed effects. Statistical methods traditionally used in the prediction of trait values in the genetics of livestock, which predates the availability of SNP data, can be applied to analysis of GWAS, giving better estimates of the SNP effects and predictions of phenotypic and genetic values in individuals.Comment: Published in at http://dx.doi.org/10.1214/09-STS306 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sensitivity of genomic selection to using different prior distributions

    Get PDF
    Genomic selection describes a selection strategy based on genomic estimated breeding values (GEBV) predicted from dense genetic markers such as single nucleotide polymorphism (SNP) data. Different Bayesian models have been suggested to derive the prediction equation, with the main difference centred around the specification of the prior distributions

    Simultaneous QTL detection and genomic breeding value estimation using high density SNP chips

    Get PDF
    Background: The simulated dataset of the 13th QTL-MAS workshop was analysed to i) detect QTL and ii) predict breeding values for animals without phenotypic information. Several parameterisations considering all SNP simultaneously were applied using Gibbs sampling. Results: Fourteen QTL were detected at the different time points. Correlations between estimated breeding values were high between models, except when the model was used that assumed that all SNP effects came from one distribution. The model that used the selected 14 SNP found associated with QTL, gave close to unity correlations with the full parameterisations. Conclusions: Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust, e.g. with respect to accuracy of estimated breeding values. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection

    Accuracy of genomic breeding values in multi-breed dairy cattle populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient matrix and from the realised accuracies of GEBV.</p> <p>Methods</p> <p>Best linear unbiased prediction with a multi-breed genomic relationship matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient matrix were compared to realised accuracies.</p> <p>Results</p> <p>When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained.</p> <p>Conclusion</p> <p>Predicting genomic breeding values using a genomic relationship matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.</p

    Estimated breeding values and association mapping for persistency and total milk yield using natural cubic smoothing splines

    Get PDF
    BackgroundFor dairy producers, a reliable description of lactation curves is a valuable tool for management and selection. From a breeding and production viewpoint, milk yield persistency and total milk yield are important traits. Understanding the genetic drivers for the phenotypic variation of both these traits could provide a means for improving these traits in commercial production.MethodsIt has been shown that Natural Cubic Smoothing Splines (NCSS) can model the features of lactation curves with greater flexibility than the traditional parametric methods. NCSS were used to model the sire effect on the lactation curves of cows. The sire solutions for persistency and total milk yield were derived using NCSS and a whole-genome approach based on a hierarchical model was developed for a large association study using single nucleotide polymorphisms (SNP).ResultsEstimated sire breeding values (EBV) for persistency and milk yield were calculated using NCSS. Persistency EBV were correlated with peak yield but not with total milk yield. Several SNP were found to be associated with both traits and these were used to identify candidate genes for further investigation.ConclusionNCSS can be used to estimate EBV for lactation persistency and total milk yield, which in turn can be used in whole-genome association studies.Klara L. Verbyla and Arunas P. Verbyl

    Whole-genome analysis of multienvironment or multitrait QTL in MAGIC

    Get PDF
    Multiparent Advanced Generation Inter-Cross (MAGIC) populations are now being utilized to more accurately identify the underlying genetic basis of quantitative traits through quantitative trait loci (QTL) analyses and subsequent gene discovery. The expanded genetic diversity present in such populations and the amplified number of recombination events mean that QTL can be identified at a higher resolution. Most QTL analyses are conducted separately for each trait within a single environment. Separate analysis does not take advantage of the underlying correlation structure found in multienvironment or multitrait data. By using this information in a joint analysis-be it multienvironment or multitrait - it is possible to gain a greater understanding of genotype- or QTL-by-environment interactions or of pleiotropic effects across traits. Furthermore, this can result in improvements in accuracy for a range of traits or in a specific target environment and can influence selection decisions. Data derived from MAGIC populations allow for founder probabilities of all founder alleles to be calculated for each individual within the population. This presents an additional layer of complexity and information that can be utilized to identify QTL. A whole-genome approach is proposed for multienvironment and multitrait QTL analysis in MAGIC. The whole-genome approach simultaneously incorporates all founder probabilities at each marker for all individuals in the analysis, rather than using a genome scan. A dimension reduction technique is implemented, which allows for high-dimensional genetic data. For each QTL identified, sizes of effects for each founder allele, the percentage of genetic variance explained, and a score to reflect the strength of the QTL are found. The approach was demonstrated to perform well in a small simulation study and for two experiments, using a wheat MAGIC population

    Aspects of statistical modelling for genomic selection

    Get PDF
    © 2010 Dr. Klara VerbylaThe research reported in this thesis investigated aspects of statistical models used for genomic selection. The importance of, and, interest in genomic selection is driven by the desire to increase the rate of genetic gain for commercially important traits. Genomic selection could increase the rate of genetic gain by increasing the accuracy of selection through the inclusion of DNA markers. Multiple methods and models have been proposed for implementing genomic selection. All methods have to overcome the problem that the number of DNA markers (p) is typically much larger than the number of phenotypic records (n) i.e. the p>n problem. One approach to this problem is to use Bayesian Inference which allows for an oversaturated model. Two simulation studies and a large data study were undertaken to gain a comprehensive understanding of what makes a robust and accurate Bayesian prediction model. Results from the simulation studies indicated that the match between the assumed QTL distribution and the true QTL distribution had an effect on the accuracy of the direct genomic values (DGV) produced by the different Bayesian models. Some of the models producing accurate DGV were computationally demanding. Subsequently, a novel Bayesian model using Stochastic Search Variable Selection (SSVS) for genomic selection was developed (Bayes SSVS). This model was demonstrated to produce accurate DGV and be computationally efficient. In contrast to the results from simulated studies, the results from a real dairy cattle data study showed a general equality in the accuracy of prediction across the various Bayesian models including Bayes SSVS. The exception was for traits with atypical genetic architectures such as fat percentage in milk where Bayes SSVS and other model selection approaches performed better than other approaches assuming that all markers equally contributed to the total genetic variation. The thesis also sought to explore the potential of genomic selection for improving novel traits that have been traditionally very difficult to select for. Energy Balance (EB) is a minimally recorded trait as the cost and measurement logistics mean it can only recorded on experimental farms. Using EB as a case study, it was demonstrated that genomic selection could provide the opportunity to select for EB and other minimally recorded through the accurate prediction of DGV. Additionally, selection for EB could be a valuable tool in finding a balance between production and non- production traits. Another attractive feature of some of the Bayesian models for genomic selection is they can be used to map QTL. Consequently, the establishment of significance when using multi-locus models for genome wide association studies was explored using a permutation testing approach. Three examples demonstrated that the permutation testing approach could correctly identify QTL. Two specialised approaches, permuting within strata, are presented. One approach accounted for a structured pedigree satisfying the condition of exchangeability. The second approach enabled the identification of secondary moderate QTL in the presence of a major QTL. The effect of the number of permutations needed was also examined; confirming previous results. This method was shown to provide accurate identification of QTL when compared with current approaches
    corecore