2,114 research outputs found

    Accuracy and responses of genomic selection on key traits in apple breeding

    Get PDF
    open13siThe application of genomic selection in fruit tree crops is expected to enhance breeding efficiency by increasing prediction accuracy, increasing selection intensity and decreasing generation interval. The objectives of this study were to assess the accuracy of prediction and selection response in commercial apple breeding programmes for key traits. The training population comprised 977 individuals derived from 20 pedigreed full-sib families. Historic phenotypic data were available on 10 traits related to productivity and fruit external appearance and genotypic data for 7829 SNPs obtained with an Illumina 20K SNP array. From these data, a genome-wide prediction model was built and subsequently used to calculate genomic breeding values of five application full-sib families. The application families had genotypes at 364 SNPs from a dedicated 512 SNP array, and these genotypic data were extended to the high-density level by imputation. These five families were phenotyped for 1 year and their phenotypes were compared to the predicted breeding values. Accuracy of genomic prediction across the 10 traits reached a maximum value of 0.5 and had a median value of 0.19. The accuracies were strongly affected by the phenotypic distribution and heritability of traits. In the largest family, significant selection response was observed for traits with high heritability and symmetric phenotypic distribution. Traits that showed non-significant response often had reduced and skewed phenotypic variation or low heritability. Among the five application families the accuracies were uncorrelated to the degree of relatedness to the training population. The results underline the potential of genomic prediction to accelerate breeding progress in outbred fruit tree crops that still need to overcome long generation intervals and extensive phenotyping costs.openMuranty, H.; Troggio, M.; Sadok, I.B.; Mehdi A.R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, P.; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M.C.A.M.Muranty, H.; Troggio, M.; Sadok, I. B.; Mehdi, A. R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, Piergiorgio; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M. C. A. M

    Genomic prediction and quantitative trait locus discovery in a cassava training population constructed from multiple breeding stages

    Get PDF
    Open Access Article; Published online: 11 Dec 2019Assembly of a training population (TP) is an important component of effective genomic selection‐based breeding programs. In this study, we examined the power of diverse germplasm assembled from two cassava (Manihot esculenta Crantz) breeding programs in Tanzania at different breeding stages to predict traits and discover quantitative trait loci (QTL). This is the first genomic selection and genome‐wide association study (GWAS) on Tanzanian cassava data. We detected QTL associated with cassava mosaic disease (CMD) resistance on chromosomes 12 and 16; QTL conferring resistance to cassava brown streak disease (CBSD) on chromosomes 9 and 11; and QTL on chromosomes 2, 3, 8, and 10 associated with resistance to CBSD for root necrosis. We detected a QTL on chromosome 4 and two QTL on chromosome 12 conferring dual resistance to CMD and CBSD. The use of clones in the same stage to construct TPs provided higher trait prediction accuracy than TPs with a mixture of clones from multiple breeding stages. Moreover, clones in the early breeding stage provided more reliable trait prediction accuracy and are better candidates for constructing a TP. Although larger TP sizes have been associated with improved accuracy, in this study, adding clones from Kibaha to those from Ukiriguru and vice versa did not improve the prediction accuracy of either population. Including the Ugandan TP in either population did not improve trait prediction accuracy. This study applied genomic prediction to understand the implications of constructing TP from clones at different breeding stages pooled from different locations on trait accuracy

    Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum.

    Get PDF
    The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits

    Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

    Get PDF
    Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    The value of expanding the training population to improve genomic selection models in tetraploid potato

    Get PDF
    <p>Genomic selection (GS) is becoming increasingly applicable to crops as the genotyping costs continue to decrease, which makes it an attractive alternative to traditional selective breeding based on observed phenotypes. With genome-wide molecular markers, selection based on predictions from genotypes can be made in the absence of direct phenotyping. The reliability of predictions depends strongly on the number of individuals used for training the predictive algorithms, particularly in a highly genetically diverse organism such as potatoes; however, the relationship between the individuals also has an enormous impact on prediction accuracy. Here we have studied genomic prediction in three different panels of potato cultivars, varying in size, design, and phenotypic profile. We have developed genomic prediction models for two important agronomic traits of potato, dry matter content and chipping quality. We used genotyping-by-sequencing to genotype 1,146 individuals and generated genomic prediction models from 167,637 markers to calculate genomic estimated breeding values with genomic best linear unbiased prediction. Cross-validated prediction correlations of 0.75–0.83 and 0.39–0.79 were obtained for dry matter content and chipping quality, respectively, when combining the three populations. These prediction accuracies were similar to those obtained when predicting performance within each panel. In contrast, but not unexpectedly, predictions across populations were generally lower, 0.37–0.71 and 0.28–0.48 for dry matter content and chipping quality, respectively. These predictions are not limited by the number of markers included, since similar prediction accuracies could be obtained when using merely 7,800 markers (<5%). Our results suggest that predictions across breeding populations in tetraploid potato are presently unreliable, but that individual prediction models within populations can be combined in an additive fashion to obtain high quality prediction models relevant for several breeding populations.</p

    Improving end-use quality in hard winter wheat through glutenin allele combinations and genomic selection

    Get PDF
    2014 Fall.Wheat (Triticum aestivum L.) has unique properties that allow for a variety of end products, such as pan bread, steamed bread, cookies, cakes, and tortillas. Most wheat-breeding programs focus on increasing yield and yield-related traits as primary objectives. However, end-use quality is also crucial as quality characteristics influence grain sale price and market success of a variety. Large-effect quantitative trait loci (QTL) have been identified for quality related traits. The Glu-1 loci encoding high molecular weight glutenin subunits (HMWGS) have a major effect on dough mixing properties. However, many quality traits are too complex to be controlled by only a small number of loci. These traits may benefit from genomic selection (GS), which utilizes all effective loci regardless of effect size. Genomic selection can accelerate genetic progress especially for traits that are costly or time consuming to phenotype, like quality-related traits. This research focused on the genetic improvement of end-use quality in hard winter wheat by targeting specific loci with known effects or by using all loci in a GS approach. The objectives of this study were to: i) evaluate agronomic and quality effects associated with different combinations of HMW-GS at the Glu-B1 and Glu-D1 loci among a set of near isogenic lines (NILs); ii) use a genome-wide association approach to identify QTL and develop predictive models for pre-harvest sprouting tolerance (PHST) and iii) assess GS models for milling and baking traits in hard winter wheat lines representative of west-central U.S. Great Plains germplasm. A set of NILs that varied for alleles at the Glu-B1 and Glu-D1 loci were evaluated for dough mixing properties, kernel characteristics, and agronomic effects. Results confirmed the Bx7OE + By8 HMW-GS (Glu-B1a1 allele) at Glu-B1 contributed to greater dough strength compared to the common Bx7 + By8 HMW-GS (Glu-B1b allele); however, the effect was not as significant as that conferred by Dx5 + Dy10 subunits (Glu-D1d allele). Near isogenic lines with the combination of both favorable alleles at Glu-B1 and Glu-D1 had the largest mixograph mixing time. However, a decrease in yield was observed for groups containing the Bx7OE + By8 subunits. These results suggest glutenin allele combinations are useful for improving bread-making characteristics in winter wheat but some combinations may be associated with negative effects on yield. Pre-harvest sprouting (PHS) is a major problem in wheat that results in decreased yield and quality. Genomic selection was evaluated as a potential breeding method for PHST given the complex inheritance and phenotyping difficulty of this trait. In this study, genotyping-by-sequencing (GBS) markers were used to identify QTL associated with PHST among a panel of hard red and white winter wheat lines. Genomic selection models were developed with the GBS data and phenotype data collected across seven growing seasons. The effect of including identified QTL and kernel color as fixed effects in the model was assessed, as kernel color has been generally associated with sprouting tolerance. Optimum marker number was also determined as accuracy can vary with different numbers of markers. Results showed model accuracy did not improve with kernel color information but weighting major QTL increased predictive performance. Optimum marker number was 4,000 with no improvement in accuracy above this threshold. Overall, model accuracies were promising and confirmed wheat breeding programs would benefit from incorporating GS models for PHST. Lastly, the accuracy of GS models for 11 end-use quality traits in a panel of hard red and white winter wheat breeding lines phenotyped across multiple years and locations was assessed. Trait heritability, marker number, and marker imputation method were evaluated for their effect on model accuracy. Traits measured included flour yield, single kernel characteristics, protein concentration, mixograph mixing time and tolerance, bake absorption, bake mixing time, crumb grain score, and loaf volume. Genotyping-by-sequencing marker data varied for marker density and imputation method used for missing data. Across traits, model accuracies ranged from 0.30 to 0.63 and trait heritability ranged from 0.03 to 0.61. Imputation method and marker density had little to no effect on model accuracy. Heritability appeared to have the greatest effect on accuracy as GS models for traits with higher heritability had higher accuracies. Additionally, GS models for moderate to high heritability traits performed better than expected when predicting a set of genotypes separate from the training panel. Results showed model accuracies for end-use quality traits were sufficient for increasing genetic gain in a wheat breeding program. In summary, genetic improvement in end-use quality can be made by utilizing both large effect and small effect loci in the wheat genome for such traits and will reduce phenotyping costs while increasing efficiency in a breeding program. In many winter wheat breeding programs, particularly those at higher latitudes, phenotypic quality evaluations from one season cannot be used for planting decisions of the next season due to the short turn-around time from harvest to planting. Genomic selection potentially solves this problem as selection decisions based on genotypic data can be implemented before the next season of planting. Thus, results from this study support the implementation of GS to reduce phenotyping costs and increase the rate of genetic gain for end-use quality in wheat

    Genomisk prediksjon ved bruk av hĂžy tetthets- og hel-genom sekvens genotyper

    Get PDF
    The main objective of this thesis was to investigate genomic prediction methods for high-density and whole-genome sequence genotypes, with emphasis on traits that may have difficulties achieving a high prediction accuracy with pedigree-based predictions, such as disease resistance and maternal traits. A Bayesian variable selection method that combines a polygenic term through a G-matrix and a BayesC term (BayesGC) was compared with Genomic Best Linear Unbiased Prediction (GBLUP), and for Paper I and II, it was also compared to BayesC. Paper I aimed to investigate genomic prediction accuracy for the trait host resistance to salmon lice in Atlantic salmon (Salmo salar). Three genomic prediction methods (GBLUP, BayesC and BayesGC) were compared using 215K and 750K SNP genotypes through both within-family and across-family prediction scenarios. The data consisted of 1385 fish with both phenotype- and genotype, and the prediction accuracy was determined through five-fold cross-validation. The results showed an accuracy of ~0.6 and ~0.61 for across-family prediction with 215K and 750K genotypes and ~0.67 for within-family prediction for both genotypes. BayesGC showed a slightly higher prediction accuracy than GBLUP and BayesC, especially for the across-family predictions, but the differences were insignificant. Paper II investigated the prediction accuracy of GBLUP, BayesC and BayesGC for six maternal traits in Landrace sows. The data consisted of between 10,000 and 15,000 sows, all genotyped and imputed to a genotype density of 660K SNPs. The effects of different priors for the Bayesian variable selection methods were also investigated. The ~1,000 youngest sows were used as validation animals to validate the prediction accuracy. Results showed a variation in genomic prediction accuracy between 0.31 to 0.61 for the different traits. The accuracy did not vary much between the different methods and priors within traits. BayesGC had a 9.8 and 3% higher accuracy than GBLUP for traits M3W and BCS. However, for the other traits, there were minor differences. For within-breed prediction marker density and sizes of reference populations are often sufficient. However, when predicting across breeds, one might need a higher density, such as Whole Genome Sequence (WGS), or one could benefit from functional markers derived from WGS. Paper III investigates prediction accuracy for four maternal traits in two pig populations, a pure-bred Landrace (L) and a Synthetic (S) Yorkshire/Large White line. Prediction accuracy was tested with three different marker data sets: High-Density (HD), Whole Genome Sequence (WGS) and markers derived from WGS based on their pig Combined Annotation Dependent Depletion (pCADD) score. Two genomic prediction methods (GBLUP and BayesGC) were investigated for across- within- and multi-line predictions. For across- and within-line prediction, reference population sizes between 1K and 30K animals were analysed for prediction accuracy. In addition, multi-line reference population consisting of 1K, 3K or 6K animals for each line in different ratios were tested. The results showed that a reference population of 3K-6K animals for within-line prediction was usually sufficient to achieve a high prediction accuracy. However, increasing to 30K animals in the reference population further increased prediction accuracy for two of the traits. A reference population of 30K across-line animals achieved a similar accuracy to 1K within-line animals. For multi-line prediction, the accuracy was most dependent on the number of within-line animals in the reference data. The S-line provided a generally higher prediction accuracy than the L-line. Using pCADD scores to reduce the number of markers from WGS data in combination with the GBLUP method generally reduced prediction accuracies relative to GBLUP_HD analyses. When using BayesGC, prediction accuracies were generally similar when using HD, pCADD, or WGS marker data, suggesting that the Bayesian method selects a suitable set of markers irrespective of the markers provided (HD, pCADD, or WGS). Overall, these three studies showed that BayesGC seemed to have a slight advantage over GBLUP, especially with large datasets, high-density genotypes, and when relationships between the reference and validation animals were lower. They also showed that the relationship between the animals in the reference and validation population, and the size of the reference population, had a more significant impact on the prediction accuracy than the prediction method
    • 

    corecore