238 research outputs found
Estimating Effects and Making Predictions from Genome-Wide Marker Data
In genome-wide association studies (GWAS), hundreds of thousands of genetic
markers (SNPs) are tested for association with a trait or phenotype. Reported
effects tend to be larger in magnitude than the true effects of these markers,
the so-called ``winner's curse.'' We argue that the classical definition of
unbiasedness is not useful in this context and propose to use a different
definition of unbiasedness that is a property of the estimator we advocate. We
suggest an integrated approach to the estimation of the SNP effects and to the
prediction of trait values, treating SNP effects as random instead of fixed
effects. Statistical methods traditionally used in the prediction of trait
values in the genetics of livestock, which predates the availability of SNP
data, can be applied to analysis of GWAS, giving better estimates of the SNP
effects and predictions of phenotypic and genetic values in individuals.Comment: Published in at http://dx.doi.org/10.1214/09-STS306 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
WGNAM: whole-genome nested association mapping
A powerful QTL analysis method for nested association mapping populations is presented. Based on a one-stage multi-locus model, it provides accurate predictions of founder specific QTL effects
Transcriptomic analysis of wheat near-isogenic lines identifies PM19-A1 and A2 as candidates for a major dormancy QTL
BACKGROUND: Next-generation sequencing technologies provide new opportunities to identify the genetic components responsible for trait variation. However, in species with large polyploid genomes, such as bread wheat, the ability to rapidly identify genes underlying quantitative trait loci (QTL) remains non-trivial. To overcome this, we introduce a novel pipeline that analyses, by RNA-sequencing, multiple near-isogenic lines segregating for a targeted QTL. RESULTS: We use this approach to characterize a major and widely utilized seed dormancy QTL located on chromosome 4AL. It exploits the power and mapping resolution afforded by large multi-parent mapping populations, whilst reducing complexity by using multi-allelic contrasts at the targeted QTL region. Our approach identifies two adjacent candidate genes within the QTL region belonging to the ABA-induced Wheat Plasma Membrane 19 family. One of them, PM19-A1, is highly expressed during grain maturation in dormant genotypes. The second, PM19-A2, shows changes in sequence causing several amino acid alterations between dormant and non-dormant genotypes. We confirm that PM19 genes are positive regulators of seed dormancy. CONCLUSIONS: The efficient identification of these strong candidates demonstrates the utility of our transcriptomic pipeline for rapid QTL to gene mapping. By using this approach we are able to provide a comprehensive genetic analysis of the major source of grain dormancy in wheat. Further analysis across a diverse panel of bread and durum wheats indicates that this important dormancy QTL predates hexaploid wheat. The use of these genes by wheat breeders could assist in the elimination of pre-harvest sprouting in wheat.Jose M. Barrero, Colin Cavanagh, Klara L. Verbyla, Josquin F.G. Tibbits, Arunas P. Verbyla, B. Emma Huang, Garry M. Rosewarne, Stuart Stephen, Penghao Wang, Alex Whan, Philippe Rigault, Matthew J. Hayden, and Frank Guble
Recommended from our members
Covariance Clustering: Modelling Covariance in Designed Experiments When the Number of Variables is Greater than Experimental Units
The size and complexity of datasets resulting from comparative research experiments in the agricultural domain is constantly increasing. Often the number of variables measured in an experiment exceeds the number of experimental units composing the experiment. When there is a necessity to model the covariance relationships that exist between variables in these experiments, estimation difficulties can arise due to the resulting covariance structure being of reduced rank. A statistical method, based in a linear mixed model framework, is presented for the analysis of designed experiments where datasets are characterised by a greater number of variables than experimental units, and for which the modelling of complex covariance structures between variables is desired. Aided by a clustering algorithm, the method enables the estimation of covariance through the introduction of covariance clusters as random effects into the modelling framework, providing an extension of the traditional variance components model for building covariance structures. The method was applied to a multi-phase mass spectrometry-based proteomics experiment, with the aim of exploring changes in the proteome of barley grain over time during the malting process. The modelling approach provides a new linear mixed model-based method for the estimation of covariance structures between variables measured from designed experiments, when there are a small number of experimental units, or observations, informing covariance parameter estimates
Accuracy of genomic breeding values in multi-breed dairy cattle populations
<p>Abstract</p> <p>Background</p> <p>Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient matrix and from the realised accuracies of GEBV.</p> <p>Methods</p> <p>Best linear unbiased prediction with a multi-breed genomic relationship matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient matrix were compared to realised accuracies.</p> <p>Results</p> <p>When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained.</p> <p>Conclusion</p> <p>Predicting genomic breeding values using a genomic relationship matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.</p
Identification of Mendelian inconsistencies between SNP and pedigree information of sibs
Background Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement. Methods Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP) genotypes between parent and offspring (PAR-OFF). Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT). The second test compares pedigree and SNP-based relationships (SIBREL). All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals. Results Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL) identified 18 (22) additional inconsistent animals. Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error), were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error), were considerably higher for SIBREL compared to SIBCOUNT. Conclusions Tests to remove Mendelian inconsistencies between sibs should be preceded by a test for parent-offspring inconsistencies. This parent-offspring test should not only consider parent-offspring pairs based on pedigree data, but also those based on SNP information. Both SIB tests could identify pairs of sibs with Mendelian inconsistencies. Based on type I and II error rates, counting opposing homozygotes between sibs (SIBCOUNT) appears slightly more precise than comparing genomic and pedigree relationships (SIBREL) to detect Mendelian inconsistencies between sib
Estimated breeding values and association mapping for persistency and total milk yield using natural cubic smoothing splines
BackgroundFor dairy producers, a reliable description of lactation curves is a valuable tool for management and selection. From a breeding and production viewpoint, milk yield persistency and total milk yield are important traits. Understanding the genetic drivers for the phenotypic variation of both these traits could provide a means for improving these traits in commercial production.MethodsIt has been shown that Natural Cubic Smoothing Splines (NCSS) can model the features of lactation curves with greater flexibility than the traditional parametric methods. NCSS were used to model the sire effect on the lactation curves of cows. The sire solutions for persistency and total milk yield were derived using NCSS and a whole-genome approach based on a hierarchical model was developed for a large association study using single nucleotide polymorphisms (SNP).ResultsEstimated sire breeding values (EBV) for persistency and milk yield were calculated using NCSS. Persistency EBV were correlated with peak yield but not with total milk yield. Several SNP were found to be associated with both traits and these were used to identify candidate genes for further investigation.ConclusionNCSS can be used to estimate EBV for lactation persistency and total milk yield, which in turn can be used in whole-genome association studies.Klara L. Verbyla and Arunas P. Verbyl
- …