77 research outputs found

    Simultaneous QTL detection and genomic breeding value estimation using high density SNP chips

    Get PDF
    Background: The simulated dataset of the 13th QTL-MAS workshop was analysed to i) detect QTL and ii) predict breeding values for animals without phenotypic information. Several parameterisations considering all SNP simultaneously were applied using Gibbs sampling. Results: Fourteen QTL were detected at the different time points. Correlations between estimated breeding values were high between models, except when the model was used that assumed that all SNP effects came from one distribution. The model that used the selected 14 SNP found associated with QTL, gave close to unity correlations with the full parameterisations. Conclusions: Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust, e.g. with respect to accuracy of estimated breeding values. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection

    Genomic selection on breeding time in a wild bird population

    Get PDF
    Abstract Artificial selection experiments are a powerful tool in evolutionary biology. Selecting individuals based on multimarker genotypes (genomic selection) has several advantages over phenotype‐based selection but has, so far, seen very limited use outside animal and plant breeding. Genomic selection depends on the markers tagging the causal loci that underlie the selected trait. Because the number of necessary markers depends, among other factors, on effective population size, genomic selection may be in practice not feasible in wild populations as most wild populations have much higher effective population sizes than domesticated populations. However, the current possibilities of cost‐effective high‐throughput genotyping could overcome this limitation and thereby make it possible to apply genomic selection also in wild populations. Using a unique dataset of about 2000 wild great tits (Parus major), a small passerine bird, genotyped on a 650 k SNP chip we calculated genomic breeding values for egg‐laying date using the so‐called GBLUP approach. In this approach, the pedigree‐based relatedness matrix of an “animal model,” a special form of the mixed model, is replaced by a marker‐based relatedness matrix. Using the marker‐based relatedness matrix, the model seemed better able to disentangle genetic and permanent environmental effects. We calculated the accuracy of genomic breeding values by correlating them to the phenotypes of individuals whose phenotypes were excluded from the analysis when estimating the genomic breeding values. The obtained accuracy was about 0.20, with very little effect of the used genomic relatedness estimator but a strong effect of the number of SNPs. The obtained accuracy is lower than typically seen in domesticated species but considerable for a trait with low heritability (∼0.2) as avian breeding time. Our results show that genomic selection is possible also in wild populations with potentially many applications, which we discuss here

    Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle

    Get PDF
    <p>Background: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods: Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results: The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions: Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.</p

    Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence.

    Get PDF
    Recent developments allowed generating multiple high-quality \u27omics\u27 data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values

    Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

    Get PDF
    We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects

    QTLMAS 2009: simulated dataset

    Get PDF
    Background - The simulation of the data for the QTLMAS 2009 Workshop is described. Objective was to simulate observations from a growth curve which was influenced by a number of QTL. Results - The data consisted of markers, phenotypes and pedigree. Genotypes of 453 markers, distributed over 5 chromosomes of 1 Morgan each, were simulated for 2,025 individuals. From those, 25 individuals were parents of the other 2,000 individuals. The 25 parents were genetically related. Phenotypes were simulated according to a logistic growth curve and were made available for 1,000 of the 2,000 offspring individuals. The logistic growth curve was specified by three parameters. Each parameter was influenced by six Quantitative Trait Loci (QTL), positioned at the five chromosomes. For each parameter, one QTL had a large effect and five QTL had small effects. Variance of large QTL was five times the variance of small QTL. Simulated data was made available at http://www.qtlmas2009.wur.nl/UK/Dataset

    Comparison of analyses of the QTLMAS XIII common dataset. I: genomic selection

    Get PDF
    Background - Genomic selection, the use of markers across the whole genome, receives increasing amounts of attention and is having more and more impact on breeding programs. Development of statistical and computational methods to estimate breeding values based on markers is a very active area of research. A simulated dataset was analyzed by participants of the QTLMAS XIII workshop, allowing a comparison of the ability of different methods to estimate genomic breeding values. Methods - A best case scenario was analyzed by the organizers where QTL genotypes were known. Participants submitted estimated breeding values for 1000 unphenotyped individuals together with a description of the applied method(s). The submitted breeding values were evaluated for correlation with the simulated values (accuracy), rank correlation of the best 10% of individuals and error in predictions. Bias was tested by regression of simulated on estimated breeding values. Results - The accuracy obtained from the best case scenario was 0.94. Six research groups submitted 19 sets of estimated breeding values. Methods that assumed the same variance for markers showed accuracies, measured as correlations between estimated and simulated values, ranging from 0.75 to 0.89 and rank correlations between 0.58 and 0.70. Methods that allowed different marker variances showed accuracies ranging from 0.86 to 0.94 and rank correlations between 0.69 and 0.82. Methods assuming equal marker variances were generally more biased and showed larger prediction errors. Conclusions - The best performing methods achieved very high accuracies, close to accuracies achieved in a best case scenario where QTL genotypes were known without error. Methods that allowed different marker variances generally outperformed methods that assumed equal marker variances. Genomic selection methods performed well compared to traditional, pedigree only, methods; all methods showed higher accuracies than those obtained for breeding values estimated solely on pedigree relationship

    Genomic Evaluation for a Crossbreeding System Implementing Breed-of-Origin for Targeted Markers

    Get PDF
    The genome in crossbred animals is a mosaic of genomic regions inherited from the different parental breeds. We previously showed that effects of haplotypes strongly associated with crossbred performance are different depending upon from which parental breed they are inherited, however, the majority of the genomic regions are not or only weakly associated with crossbred performance. Therefore, our objective was to develop a model that distinguishes between selected single nucleotide polymorphisms (SNP) strongly associated with crossbred performance and all remaining SNP. For the selected SNP, breed-specific allele effects were fitted whereas for the remaining SNP it was assumed that effects are the same across breeds (SEL-BOA model). We used data from three purebred populations; S, LR, and LW, and the corresponding crossbred population. We selected SNP that explained together either 5 or 10% of the total crossbred genetic variance for average daily gain in each breed of origin. The model was compared to a model where all SNP-alleles were allowed to have different effects for crossbred performance depending upon the breed of origin (BOA model) and to a model where all SNP-alleles had the same effect for crossbred performance across breeds (G model). Across the models, the heritability for crossbred performance was very similar with values of 0.29–0.30. With the SEL-BOA models, in general, the purebred-crossbred genetic correlation (rpc) for the selected SNP was larger than for the non-selected SNP. For breed LR, the rpc for selected SNP and non-selected SNP estimated with the SEL-BOA 5% and SEL-BOA 10% were very different compared to the rpc estimated with the G or BOA model. For breeds S and LW, there was not a big discrepancy for the rpc estimated with the SEL-BOA models and with the G or BOA model. The BOA model calculates more accurate breeding values of purebred animals for crossbred performance than the G model when rpc differs (≈10%) between the G and the BOA model. Superiority of the SEL-BOA model compared to the BOA model was only observed for SEL-BOA 10% and when rpc for the selected and non-selected SNP differed both (≈20%) from the rpc estimated by the G or BOA model

    International single-step SNPBLUP beef cattle evaluations for Limousin weaning weight

    Get PDF
    Background Compared to national evaluations, international collaboration projects further improve accuracies of estimated breeding values (EBV) by building larger reference populations or performing a joint evaluation using data (or proxy of them) from different countries. Genomic selection is increasingly adopted in beef cattle, but, to date, the benefits of including genomic information in international evaluations have not been explored. Our objective was to develop an international beef cattle single-step genomic evaluation and investigate its impact on the accuracy and bias of genomic evaluations compared to current pedigree-based evaluations. Methods Weaning weight records were available for 331,593 animals from seven European countries. The pedigree included 519,740 animals. After imputation and quality control, 17,607 genotypes at a density of 57,899 single nucleotide polymorphisms (SNPs) from four countries were available. We implemented two international scenarios where countries were modelled as different correlated traits: an international genomic single-step SNP best linear unbiased prediction (SNPBLUP) evaluation (ssSNPBLUP(INT)) and an international pedigree-based BLUP evaluation (PBLUPINT). Two national scenarios were implemented for pedigree and genomic evaluations using only nationally submitted phenotypes and genotypes. Accuracies, level and dispersion bias of EBV of animals born from 2014 onwards, and increases in population accuracies were estimated using the linear regression method. Results On average across countries, 39 and 17% of sires and maternal-grand-sires with recorded (grand-)offspring across two countries were genotyped. ssSNPBLUP(INT) showed the highest accuracies of EBV and, compared to PBLUPINT, led to increases in population accuracy of 13.7% for direct EBV, and 25.8% for maternal EBV, on average across countries. Increases in population accuracies when moving from national scenarios to ssSNPBLUP(INT) were observed for all countries. Overall, ssSNPBLUP(INT) level and dispersion bias remained similar or slightly reduced compared to PBLUPINT and national scenarios. Conclusions International single-step SNPBLUP evaluations are feasible and lead to higher population accuracies for both large and small countries compared to current international pedigree-based evaluations and national evaluations. These results are likely related to the larger multi-country reference population and the inclusion of phenotypes from relatives recorded in other countries via single-step international evaluations. The proposed international single-step approach can be applied to other traits and breeds
    • …