2,521 research outputs found

    Restricted maximum likelihood estimation of covariances in sparse linear models

    Get PDF
    This paper discusses the restricted maximum likelihood (REML) approach for the estimation of covariance matrices in linear stochastic models, as implemented in the current version of the VCE package for covariance component estimation in large animal breeding models. The main features are: 1) the representation of the equations in an augmented form that simplifies the implementation; 2) the parametrization of the covariance matrices by means of their Cholesky factors, thus automatically ensuring their positive definiteness; 3) explicit formulas for the gradients of the REML function for the case of large and sparse model equations with a large number of unknown covariance components and possibly incomplete data, using the sparse inverse to obtain the gradients cheaply; 4) use of model equations that make separate formation of the inverse of the numerator relationship matrix unnecessary. Many large scale breeding problems were solved with the new implementation, among them an example with more than 250 000 normal equations and 55 covariance components, taking 41 h CPU time on a Hewlett Packard 755

    High performance computing for large-scale genomic prediction

    Get PDF
    In the past decades genetics was studied intensively leading to the knowledge that DNA is the molecule behind genetic inheritance and starting from the new millennium next-generation sequencing methods made it possible to sample this DNA with an ever decreasing cost. Animal and plant breeders have always made use of genetic information to predict agronomic performance of new breeds. While this genetic information previously was gathered from the pedigree of the population under study, genomic information of the DNA makes it possible to also deduce correlations between individuals that do not share any known ancestors leading to so-called genomic prediction of agronomic performance. Nowadays, the number of informative samples that can be taken from a genome ranges from one thousand to one million. Using all this information in a breeding context where agronomic performance is predicted and optimized for different environmental conditions is not a straightforward task. Moreover, the number of individuals for which this information is available keeps on growing and thus sophisticated computational methods are required for analyzing these large scale genomic data sets. This thesis introduces some concepts of high performance computing in a genomic prediction context and shows that analyzing phenotypic records of large numbers of genotyped individuals leads to a better prediction accuracy of the agronomic performance in different environments. Finally, it is even shown that the parts of the DNA that influence the agronomic performance under certain environmental conditions can be pinpointed, and this knowledge can thus be used by breeders to select individuals that thrive better in the targeted environment

    Genomic prediction when some animals are not genotyped

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of genomic selection in breeding programs may increase the rate of genetic improvement, reduce the generation time, and provide higher accuracy of estimated breeding values (EBVs). A number of different methods have been developed for genomic prediction of breeding values, but many of them assume that all animals have been genotyped. In practice, not all animals are genotyped, and the methods have to be adapted to this situation.</p> <p>Results</p> <p>In this paper we provide an extension of a linear mixed model method for genomic prediction to the situation with non-genotyped animals. The model specifies that a breeding value is the sum of a genomic and a polygenic genetic random effect, where genomic genetic random effects are correlated with a genomic relationship matrix constructed from markers and the polygenic genetic random effects are correlated with the usual relationship matrix. The extension of the model to non-genotyped animals is made by using the pedigree to derive an extension of the genomic relationship matrix to non-genotyped animals. As a result, in the extended model the estimated breeding values are obtained by blending the information used to compute traditional EBVs and the information used to compute purely genomic EBVs. Parameters in the model are estimated using average information REML and estimated breeding values are best linear unbiased predictions (BLUPs). The method is illustrated using a simulated data set.</p> <p>Conclusions</p> <p>The extension of the method to non-genotyped animals presented in this paper makes it possible to integrate all the genomic, pedigree and phenotype information into a one-step procedure for genomic prediction. Such a one-step procedure results in more accurate estimated breeding values and has the potential to become the standard tool for genomic prediction of breeding values in future practical evaluations in pig and cattle breeding.</p

    Approximate genome-based kernel models for large data sets including main effects and interactions

    Get PDF
    The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.publishedVersio

    Quantitative genetic modeling and inference in the presence of nonignorable missing data.

    Get PDF
    Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance

    Using mapped quantitative trait loci in improving genetic evaluation

    Get PDF
    The benefit of using QTL information in dairy cattle breeding schemes by means of computer simulation is investigated. In addition, algorithms to overcome computational problems arising when marker data are included in mixed linear models were proposed.;Computer simulation was conducted with parameters relative to the Holstein population of the United States. Superiority of QTL-assisted selection (QAS) over QTL-free selection was studied in four pathways of selection, namely active sires, young bulls, bull dams, and cows, for cumulative genetic response, accuracy of evaluation, and selection pressure on the QTL.;Further, breeding scheme as a factor was studied. The breeding scheme was the most effective factor in increasing the superiority of QAS. As it agreed with many previous studies, nucleus breeding schemes were found to be promising systems to implement QTL information. On the other hand, benefits of QAS in conventional two stage selection programs were limited.;The interaction between the type of QTL information available and the breeding system was found important. Using a highly polymorphic QTL in nucleus schemes was found very effective. Effects of different number of alleles per locus and different number of loci on the superiority of QAS were studied.;An algorithm to directly build the inverse of a conditional gametic relationship matrix, given marker data, was developed. The inverse algorithm is based on matrix decomposition instead of partitioned matrix theory. Numerical techniques that greatly improved computing performance were introduced.;Appropriate modifications to the conventional breeding schemes that are currently in use are highly recommended. Further, attention should be paid to the characteristics of the QTL and how they may interact with the breeding system, e.g., number of loci and alleles. Finally, the study found that the use of marked or known QTL information in genetic evaluation is computationally possible and generally useful

    Genomic analysis of dominance effects on milk production and conformation traits in Fleckvieh cattle

    Get PDF
    Background Estimates of dominance variance in dairy cattle based on pedigree data vary considerably across traits and amount to up to 50% of the total genetic variance for conformation traits and up to 43% for milk production traits. Using bovine SNP (single nucleotide polymorphism) genotypes, dominance variance can be estimated both at the marker level and at the animal level using genomic dominance effect relationship matrices. Yield deviations of high-density genotyped Fleckvieh cows were used to assess cross-validation accuracy of genomic predictions with additive and dominance models. The potential use of dominance variance in planned matings was also investigated. Results Variance components of nine milk production and conformation traits were estimated with additive and dominance models using yield deviations of 1996 Fleckvieh cows and ranged from 3.3% to 50.5% of the total genetic variance. REML and Gibbs sampling estimates showed good concordance. Although standard errors of estimates of dominance variance were rather large, estimates of dominance variance for milk, fat and protein yields, somatic cell score and milkability were significantly different from 0. Cross-validation accuracy of predicted breeding values was higher with genomic models than with the pedigree model. Inclusion of dominance effects did not increase the accuracy of the predicted breeding and total genetic values. Additive and dominance SNP effects for milk yield and protein yield were estimated with a BLUP (best linear unbiased prediction) model and used to calculate expectations of breeding values and total genetic values for putative offspring. Selection on total genetic value instead of breeding value would result in a larger expected total genetic superiority in progeny, i.e. 14.8% for milk yield and 27.8% for protein yield and reduce the expected additive genetic gain only by 4.5% for milk yield and 2.6% for protein yield. Conclusions Estimated dominance variance was substantial for most of the analyzed traits. Due to small dominance effect relationships between cows, predictions of individual dominance deviations were very inaccurate and including dominance in the model did not improve prediction accuracy in the cross-validation study. Exploitation of dominance variance in assortative matings was promising and did not appear to severely compromise additive genetic gain
    corecore