1,697 research outputs found
Effective Genetic Risk Prediction Using Mixed Models
To date, efforts to produce high-quality polygenic risk scores from
genome-wide studies of common disease have focused on estimating and
aggregating the effects of multiple SNPs. Here we propose a novel statistical
approach for genetic risk prediction, based on random and mixed effects models.
Our approach (termed GeRSI) circumvents the need to estimate the effect sizes
of numerous SNPs by treating these effects as random, producing predictions
which are consistently superior to current state of the art, as we demonstrate
in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC
study, we confirm that the use of random effects is most beneficial for
diseases that are known to be highly polygenic: hypertension (HT) and bipolar
disorder (BD). For HT, there are no significant associations in the WTCCC data.
The best existing model yields an AUC of 54%, while GeRSI improves it to 59%.
For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at
the top 10% of BD risk predictions, using GeRSI substantially increases the BD
relative risk from 1.4 to 2.5.Comment: main text: 14 pages, 3 figures. Supplementary text: 16 pages, 21
figure
An Upper Bound for Accuracy of Prediction Using GBLUP.
This study aims at characterizing the asymptotic behavior of genomic prediction R2 as the size of the reference population increases for common or rare QTL alleles through simulations. Haplotypes derived from whole-genome sequence of 85 Caucasian individuals from the 1,000 Genomes Project were used to simulate random mating in a population of 10,000 individuals for at least 100 generations to create the LD structure in humans for a large number of individuals. To reduce computational demands, only SNPs within a 0.1M region of each of the first 5 chromosomes were used in simulations, and therefore, the total genome length simulated was 0.5M. When the genome length is 30M, to get the same genomic prediction R2 as with a 0.5M genome would require a reference population 60 fold larger. Three scenarios were considered varying in minor allele frequency distributions of markers and QTL, for h2 = 0.8 resembling height in humans. Total number of markers was 4,200 and QTL were 70 for each scenario. In this study, we considered the prediction accuracy in terms of an estimability problem, and thereby provided an upper bound for reliability of prediction, and thus, for prediction R2. Genomic prediction methods GBLUP, BayesB and BayesC were compared. Our results imply that for human height variable selection methods BayesB and BayesC applied to a 30M genome have no advantage over GBLUP when the size of reference population was small (<6,000 individuals), but are superior as more individuals are included in the reference population. All methods become asymptotically equivalent in terms of prediction R2, which approaches genomic heritability when the size of the reference population reaches 480,000 individuals
Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum.
The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits
Multiple Quantitative Trait Analysis Using Bayesian Networks
Models for genome-wide prediction and association studies usually target a
single phenotypic trait. However, in animal and plant genetics it is common to
record information on multiple phenotypes for each individual that will be
genotyped. Modeling traits individually disregards the fact that they are most
likely associated due to pleiotropy and shared biological basis, thus providing
only a partial, confounded view of genetic effects and phenotypic interactions.
In this paper we use data from a Multiparent Advanced Generation Inter-Cross
(MAGIC) winter wheat population to explore Bayesian networks as a convenient
and interpretable framework for the simultaneous modeling of multiple
quantitative traits. We show that they are equivalent to multivariate genetic
best linear unbiased prediction (GBLUP), and that they are competitive with
single-trait elastic net and single-trait GBLUP in predictive performance.
Finally, we discuss their relationship with other additive-effects models and
their advantages in inference and interpretation. MAGIC populations provide an
ideal setting for this kind of investigation because the very low population
structure and large sample size result in predictive models with good power and
limited confounding due to relatedness.Comment: 28 pages, 1 figure, code at
http://www.bnlearn.com/research/genetics1
Improving the Efficiency of Genomic Selection
We investigate two approaches to increase the efficiency of phenotypic
prediction from genome-wide markers, which is a key step for genomic selection
(GS) in plant and animal breeding. The first approach is feature selection
based on Markov blankets, which provide a theoretically-sound framework for
identifying non-informative markers. Fitting GS models using only the
informative markers results in simpler models, which may allow cost savings
from reduced genotyping. We show that this is accompanied by no loss, and
possibly a small gain, in predictive power for four GS models: partial least
squares (PLS), ridge regression, LASSO and elastic net. The second approach is
the choice of kinship coefficients for genomic best linear unbiased prediction
(GBLUP). We compare kinships based on different combinations of centring and
scaling of marker genotypes, and a newly proposed kinship measure that adjusts
for linkage disequilibrium (LD).
We illustrate the use of both approaches and examine their performances using
three real-world data sets from plant and animal genetics. We find that elastic
net with feature selection and GBLUP using LD-adjusted kinships performed
similarly well, and were the best-performing methods in our study.Comment: 17 pages, 5 figure
Dominance and GĂE interaction effects improvegenomic prediction and genetic gain inintermediate wheatgrass (Thinopyrumintermedium)
Genomic selection (GS) based recurrent selection methods were developed to accelerate the domestication of intermediate wheatgrass [IWG, Thinopyrum intermedium (Host) Barkworth & D.R. Dewey]. A subset of the breeding population phenotyped at multiple environments is used to train GS models and then predict trait values of the breeding population. In this study, we implemented several GS models that investigated the use of additive and dominance effects and GĂE interaction effects to understand how they affected trait predictions in intermediate wheatgrass. We evaluated 451 genotypes from the University of Minnesota IWG breeding program for nine agronomic and domestication traits at two Minnesota locations during 2017â2018. Genet-mean based heritabilities for these traits ranged from 0.34 to 0.77. Using fourfold cross validation, we observed the highest predictive abilities (correlation of 0.67) in models that considered GĂE effects. When GĂE effects were fitted in GS models, trait predictions improved by 18%, 15%, 20%, and 23% for yield, spike weight, spike length, and free threshing, respectively. Genomic selection models with dominance effects showed only modest increases of up to 3% and were trait-dependent. Crossenvironment predictions were better for high heritability traits such as spike length, shatter resistance, free threshing, grain weight, and seed length than traits with low heritability and large environmental variance such as spike weight, grain yield, and seed width. Our results confirm that GS can accelerate IWG domestication by increasing genetic gain per breeding cycle and assist in selection of genotypes with promise of better performance in diverse environments
Genomic selection in rubber tree breeding: A comparison of models and methods for managing GĂE interactions
Several genomic prediction models combining genotype Ă environment (GĂE) interactions have recently been developed and used for genomic selection (GS) in plant breeding programs. GĂE interactions reduce selection accuracy and limit genetic gains in plant breeding. Two data sets were used to compare the prediction abilities of multienvironment GĂE genomic models and two kernel methods. Specifically, a linear kernel, or GB (genomic best linear unbiased predictor [GBLUP]), and a nonlinear kernel, or Gaussian kernel (GK), were used to compare the prediction accuracies (PAs) of four genomic prediction models: 1) a single-environment, main genotypic effect model (SM); 2) a multienvironment, main genotypic effect model (MM); 3) a multienvironment, single-variance GĂE deviation model (MDs); and 4) a multienvironment, environment-specific variance GĂE deviation model (MDe). We evaluated the utility of genomic selection (GS) for 435 individual rubber trees at two sites and genotyped the individuals via genotyping-by-sequencing (GBS) of single-nucleotide polymorphisms (SNPs). Prediction models were used to estimate stem circumference (SC) during the first 4 years of tree development in conjunction with a broad-sense heritability (H2) of 0.60. Applying the model (SM, MM, MDs, and MDe) and kernel method (GB and GK) combinations to the rubber tree data revealed that the multienvironment models were superior to the single-environment genomic models, regardless of the kernel (GB or GK) used, suggesting that introducing interactions between markers and environmental conditions increases the proportion of variance explained by the model and, more importantly, the PA. Compared with the classic breeding method (CBM), methods in which GS is incorporated resulted in a 5-fold increase in response to selection for SC with multienvironment GS (MM, MDe, or MDs). Furthermore, GS resulted in a more balanced selection response for SC and contributed to a reduction in selection time when used in conjunction with traditional genetic breeding programs. Given the rapid advances in genotyping methods and their declining costs and given the overall costs of large-scale progeny testing and shortened breeding cycles, we expect GS to be implemented in rubber tree breeding programs
- âŠ