31 research outputs found

    More animals than markers: a study into the application of the single step T-BLUP model in large-scale multi-trait Australian Angus beef cattle genetic evaluation

    Get PDF
    International audienceAbstractMulti-trait single step genetic evaluation is increasingly facing the situation of having more individuals with genotypes than markers within each genotype. This creates a situation where the genomic relationship matrix (G\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}G\mathbf{G }\end{document}) is not of full rank and its inversion is algebraically impossible. Recently, the SS-T-BLUP method was proposed as a modified version of the single step equations, providing an elegant way to circumvent the inversion of the G\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}G\mathbf{G }\end{document} and therefore accommodate the situation described. SS-T-BLUP uses the Woodbury matrix identity, thus it requires an add-on matrix, which is usually the covariance matrix of the residual polygenic effet. In this paper, we examine the application of SS-T-BLUP to a large-scale multi-trait Australian Angus beef cattle dataset using the full BREEDPLAN single step genetic evaluation model and compare the results to the application of two different methods of using G\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}G\mathbf{G }\end{document} in a single step model. Results clearly show that SS-T-BLUP outperforms other single step formulations in terms of computational speed and avoids approximation of the inverse of G\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}G\mathbf{G }\end{document}

    On detection of population stratification in genotype samples using spacial clustering and non-linear optimization

    No full text
    Accounting for population stratification in genotype samples is important to avoid false inference from genome wide association studies. It is usually quantified using model-based ancestry estimation (e.g. ADMIXTURE; Alexander et al. (2009)), which has disadvantages with regard to model assumptions and processing time. This article describes a two step procedure for estimating population stratification. In the first step a spacial cluster algorithm is used to detect clusters of genetically homogeneous animals. In a subsequent step genotypes are described as linear functions of within-cluster allele frequencies. The approach was tested on a cattle data set which consisted of 11,639 real genotypes from 11 breeds and 5,000 artificially generated cross-bred genotypes (F1 to F5). It outperformed results obtained from ADMIXTURE in terms of speed and accuracy

    On marker-based parentage verification via non-linear optimization

    No full text
    International audienceAbstractBackgroundParentage verification by molecular markers is mainly based on short tandem repeat markers. Single nucleotide polymorphisms (SNPs) as bi-allelic markers have become the markers of choice for genotyping projects. Thus, the subsequent step is to use SNP genotypes for parentage verification as well. Recent developments of algorithms such as evaluating opposing homozygous SNP genotypes have drawbacks, for example the inability of rejecting all animals of a sample of potential parents. This paper describes an algorithm for parentage verification by constrained regression which overcomes the latter limitation and proves to be very fast and accurate even when the number of SNPs is as low as 50. The algorithm was tested on a sample of 14,816 animals with 50, 100 and 500 SNP genotypes randomly selected from 40k genotypes. The samples of putative parents of these animals contained either five random animals, or four random animals and the true sire. Parentage assignment was performed by ranking of regression coefficients, or by setting a minimum threshold for regression coefficients. The assignment quality was evaluated by the power of assignment (Pa\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}a_{\text {a}}\end{document}) and the power of exclusion (Pe\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}e_{\text {e}}\end{document}).ResultsIf the sample of putative parents contained the true sire and parentage was assigned by coefficient ranking, Pa\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}a_{\text {a}}\end{document} and Pe\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}e_{\text {e}}\end{document} were both higher than 0.99 for the 500 and 100 SNP genotypes, and higher than 0.98 for the 50 SNP genotypes. When parentage was assigned by a coefficient threshold, Pe\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}e_{\text {e}}\end{document} was higher than 0.99 regardless of the number of SNPs, but Pa\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}a_{\text {a}}\end{document} decreased from 0.99 (500 SNPs) to 0.97 (100 SNPs) and 0.92 (50 SNPs). If the sample of putative parents did not contain the true sire and parentage was rejected using a coefficient threshold, the algorithm achieved a Pe\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}e_{\text {e}}\end{document} of 1 (500 SNPs), 0.99 (100 SNPs) and 0.97 (50 SNPs).ConclusionThe algorithm described here is easy to implement, fast and accurate, and is able to assign parentage using genomic marker data with a size as low as 50 SNPs

    On Breed Composition Estimation of Cross-Bred Animals Using Non-Linear Optimisation

    No full text
    Genetically admixed animals are common in most quantitative genetic analysis, and usually are a result of intended crosses between two or more pure breed populations to enhance productivity. Disregarding the genetic heterogeneous architecture of admixed individuals may lead to poor or even wrong inference about the quality, quantity and genome location of genetic factors affecting phenotypes, and it could reduce the accuracy of estimates of genetic merit. In this article a nonlinear optimisation approach (constrained genomic regression, CGR) is presented to describe the marker genotype of a focus animal as a linear function of marker allele frequencies of possible populations of origin. The algorithm was tested on a beef cattle data set consisting of 11639 animals from 11 different breeds with marker genotypes of 4022 single nucleotide polymorphisms, which were used to generate 5000 artificially cross-bred animals. For comparison the data set was also analysed with the ADMIXTURE software (ADM). CGR outperformed ADM with a maximum difference between the true and estimated breed proportion of 0.25 and 0.28 for the 5 and 25 cross-over data set respectively. For ADM this parameter was 0.83 and 0.66. The mean squared estimation error was 15 and 5 times larger for ADM compared to CGR for the 5 and 25 cross-over data set respectively. In addition, CGR always outperformed ADM in terms of speed by factor 20

    On Estimation of Genome Composition in Genetically Admixed Individuals Using Constrained Genomic Regression

    No full text
    Quantifying the population stratification in genotype samples has become a standard procedure for data manipulation before conducting genome wide association studies, as well as for tracing patterns of migration in humans and animals, and for inference about extinct founder populations. The most widely used approach capable of providing biologically interpretable results is a likelihood formulation which allows for estimation of founder genome proportions and founder allele frequency conditional on the observed genotypes. However, if founder allele frequencies are known and samples are dominated by admixed genotypes this approach may lead to biased inference. In addition, processing time increases drastically with the number of genetic markers. This article describes a simplified approach for obtaining biologically meaningful measures of population stratification at the genotype level conditional on known founder allele frequencies. It was tested on cattle and human data sets with 4,022 and 150,000 genetic markers, respectively, and proved to be very accurate in situations where founder poplations were correctly specified, or under-, over-, and miss-specified. Moreover, processing time was only marginally affected by an increase in the number of markers

    A fast method for evaluating opposing homozygosity in large SNP data sets

    No full text
    Optimized algorithms are indispensable for analyzing large SNP data sets. To date, research has focused on the development of methods for calculating genomic relationship matrices. However, little attention has been given to algorithms for calculating the number of opposing homozygous SNP loci (OH) between genotyped individuals, where this parameter can be used to detect pedigree errors, genotyping errors, mixing of DNA samples, or for paternity tests. A recently proposed approach (LOOP) is sufficient for small data sets but not applicable to larger data sets in terms of number of SNPs and genotyped individuals. In this paper we propose a fast method for the calculation of OH in matrix format (OHM). This method is very fast and easy to implement. For example, it can create the OHM for 12,000 individuals genotyped for 40,000 SNPs with only 12% of the real time used by the LOOP approach. Thus, calculation of OHM from a sequence of matrix manipulations substantially increased the speed for determining the number of opposing homozygous SNP loci between all genotyped individuals of a data set. Given the availability of packages facilitating parallel processing this holds even when using R, and therefore allows inference from OHM even for large data sets

    Optimising multistage dairy cattle breeding programs with regard to genomic selection

    No full text
    Multistage dairy cattle breeding schemes consisting of 4 selection paths were optimised in order to maximise the genetic gain per year with regard to genomic selection on 2 genomically estimated breeding values differing in costs and accuracy. Results clearly show that the selection of bull dams is the major field of application for low-density genotyping but also emphasise the selection of sires to be of continuously highest importance for the generation of the genetic gain irrespective of increasing costs for high-density genotyping

    SNP Based Parentage Verification via Constraint Non-Linear Optimisation

    No full text
    Since the introduction of parentage verification by molecular markers this technique is based mainly on short tandem repeat markers (STR). With the advent of single nucleotide polymorphism (SNP), advances in genotyping technologies and decreasing costs, SNPs have become the marker of choice for genotyping projects. This is because the genotypes have a wide range of applications and imputation technologies provide well a developed compatibility layer between different types of SNP genotypes. Thus, the subsequent step is to use SNP genotypes for parentage verification as well. However, algorithms for parentage verification mostly date back to the STR era, and recent developments of SNP based algorithms such as evaluating opposing homozygosity have drawbacks, for example the inability of rejecting all animals of a sample of potential parents. This paper describes an algorithm for parentage verification via non-linear optimisation which overcomes the latter limitations and proofs to be very fast and highly accurate even with number of SNPs as low as 100. The algorithm was tested on a sample of 90 animals with 100, 500 and 40k SNP genotypes. These animals were evaluated against a pool of 12 putative parents containing random animals only, random animals and the true dam, and random animals, the true dam and the true sire. Assignment quality of the algorithm was evaluated by the power of assignment (Pa , probability of picking the true parent when true parent was among the putative parents) and the power of exclusion (Pe, probability of rejecting all parents if the true parent was not among the putative parents). When used with 40k genotypes, the algorithm assigned parentage correctly for all 90 test animals. That is, if one or both parents were among the putative parents they were correctly identified. If both were absent parentage was ruled out for the whole set of putative parents. A similar result was achieved when shrinking the genotypes to 500 randomly selected SNP, with Pe = 0.99 and Pa = 1. When only 100 SNP, randomly selected but the sample space narrowed by the minor allele frequency >0.3, were used, Pe and Pa were still 0.99 and 0.96, respectively. The described method is an easy to implement, fast and accurate algorithm to assign parentage using genomic marker data of size as low as 100 SNP. It overcomes limitation of methods such as evaluation of opposing homozygosity by not relying on the presence of a true parent in the pool of putative parents

    Gametic gene flow method accounts for genomic imprinting and inbreeding

    No full text
    Findings within the last fifteen years emphasise the possible role of genomic imprinting for trait expression in livestock species. In genetic evaluation, genomically imprinted traits can be treated by models with two different breeding values per animal; one accounts for the paternal and the other for the maternal expression pattern. Relative weighting factors for these breeding values were derived by a generalised version of the discounted gene flow method, which was extended to a gametic level to account for parent-of-origin effects

    Accuracy of Igenity Direct Genomic Values in Australian Angus

    No full text
    The quality of Igenity² direct genomic values (GEBVs) derived by two different prediction procedures for 12 traits of 1032 Angus bulls was estimated as the genetic correlation to their phenotypic target traits. In addition, the effect of a decreasing genetic relationship between validation and training population was inferred by subdividing the set of 1032 GEBVs accordingly. Genetic correlations estimated were medium to high even when all training individuals were excluded from the analysis, and well in line with those already published. Thus blending Australian Angus breeding values with Igenity GEBVs can be beneficial for breeders
    corecore