23 research outputs found

    A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation.</p> <p>Methods</p> <p>An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis.</p> <p>Results</p> <p>Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored.</p> <p>Conclusions</p> <p>The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations.</p

    Evaluation of methods and marker systems in genomic selection of oil palm (Elaeis guineensis Jacq.)

    Get PDF
    Background Genomic selection (GS) uses genome-wide markers as an attempt to accelerate genetic gain in breeding programs of both animals and plants. This approach is particularly useful for perennial crops such as oil palm, which have long breeding cycles, and for which the optimal method for GS is still under debate. In this study, we evaluated the effect of different marker systems and modeling methods for implementing GS in an introgressed dura family derived from a Deli dura x Nigerian dura (Deli x Nigerian) with 112 individuals. This family is an important breeding source for developing new mother palms for superior oil yield and bunch characters. The traits of interest selected for this study were fruit-to-bunch (F/B), shell-to-fruit (S/F), kernel-to-fruit (K/F), mesocarp-to-fruit (M/F), oil per palm (O/P) and oil-to-dry mesocarp (O/DM). The marker systems evaluated were simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). RR-BLUP, Bayesian A, B, Cπ, LASSO, Ridge Regression and two machine learning methods (SVM and Random Forest) were used to evaluate GS accuracy of the traits. Results The kinship coefficient between individuals in this family ranged from 0.35 to 0.62. S/F and O/DM had the highest genomic heritability, whereas F/B and O/P had the lowest. The accuracies using 135 SSRs were low, with accuracies of the traits around 0.20. The average accuracy of machine learning methods was 0.24, as compared to 0.20 achieved by other methods. The trait with the highest mean accuracy was F/B (0.28), while the lowest were both M/F and O/P (0.18). By using whole genomic SNPs, the accuracies for all traits, especially for O/DM (0.43), S/F (0.39) and M/F (0.30) were improved. The average accuracy of machine learning methods was 0.32, compared to 0.31 achieved by other methods. Conclusion Due to high genomic resolution, the use of whole-genome SNPs improved the efficiency of GS dramatically for oil palm and is recommended for dura breeding programs. Machine learning slightly outperformed other methods, but required parameters optimization for GS implementation

    Inversion of a part of the numerator relationship matrix using pedigree information

    Get PDF
    Background. In recent theoretical developments, the information available (e.g. genotypes) divides the original population into two groups: animals with this information (selected animals) and animals without this information (excluded animals). These developments require inversion of the part of the pedigree-based numerator relationship matrix that describes the genetic covariance between selected animals (A22). Our main objective was to propose and evaluate methodology that takes advantage of any potential sparsity in the inverse of A22 in order to reduce the computing time required for its inversion. This potential sparsity is brought out by searching the pedigree for dependencies between the selected animals. Jointly, we expected distant ancestors to provide relationship ties that increase the density of matrix A22 but that their effect on A22i might be minor. This hypothesis was also tested. Methods. The inverse of A22 can be computed from the inverse of the triangular factor (T-1 ) obtained by Cholesky root-free decomposition of A22 . We propose an algorithm that sets up the sparsity pattern of T-1 using pedigree information. This algorithm provides positions of the elements of T-1 worth to be computed (i.e. different from zero). A recursive computation of A22i is then achieved with or without information on the sparsity pattern and time required for each computation was recorded. For three numbers of selected animals (4000; 8000 and 12 000), A22 was computed using different pedigree extractions and the closeness of the resulting A22i to the inverse computed using the fully extracted pedigree was measured by an appropriate norm. Results. The use of prior information on the sparsity of T-1 decreased the computing time for inversion by a factor of 1.73 on average. Computational issues and practical uses of the different algorithms were discussed. Cases involving more than 12 000 selected animals were considered. Inclusion of 10 generations was determined to be sufficient when computing A22. Conclusions. Depending on the size and structure of the selected sub-population, gains in time to compute A22 are possible and these gains may increase as the number of selected animals increases. Given the sequential nature of most computational steps, the proposed algorithm can benefit from optimization and may be convenient for genomic evaluations.NextGenGE

    Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single-step methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with single-step methods is compatibility between the marker-based relationship matrix for genotyped animals and the pedigree-based relationship matrix. Therefore, it is necessary to adjust the marker-based relationship matrix to the pedigree-based relationship matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix instead of the reverse and 2) extending the single-step genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets.</p> <p>Results</p> <p>The method derived here is a single-step method in which the marker-based relationship matrix is constructed assuming all allele frequencies equal to 0.5 and the pedigree-based relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbred with a relationship coefficient <it>γ</it> and an inbreeding coefficient <it>γ</it> / 2. Taken together, this <it>γ</it> parameter and a parameter that scales the marker-based relationship matrix can handle the issue of compatibility between marker-based and pedigree-based relationship matrices. The full log-likelihood function used for parameter inference contains two terms. The first term is the REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, whereas the second term is the log-likelihood for the observed marker genotypes. Analyses of the two simulated datasets with this new method showed that 1) the parameters involved in adjusting marker-based and pedigree-based relationship matrices can depend on both observed phenotypes and observed marker genotypes and 2) a strong association between these two parameters exists. Finally, this method performed at least as well as a method based on adjusting the marker-based relationship matrix.</p> <p>Conclusions</p> <p>Using the full log-likelihood and adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in single-step genetic evaluation.</p
    corecore