255 research outputs found

    Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels

    Get PDF
    International audienceBackground Currently, genomic prediction in cattle is largely based on panels of about 54k single nucleotide polymorphisms (SNPs). However with the decreasing costs of and current advances in next-generation sequencing technologies, whole-genome sequence (WGS) data on large numbers of individuals is within reach. Availability of such data provides new opportunities for genomic selection, which need to be explored.MethodsThis simulation study investigated how much predictive ability is gained by using WGS data under scenarios with QTL (quantitative trait loci) densities ranging from 45 to 132 QTL/Morgan and heritabilities ranging from 0.07 to 0.30, compared to different SNP densities, with emphasis on divergent dairy cattle breeds with small populations. The relative performances of best linear unbiased prediction (SNP-BLUP) and of a variable selection method with a mixture of two normal distributions (MixP) were also evaluated. Genomic predictions were based on within-population, across-population, and multi-breed reference populations.ResultsThe use of WGS data for within-population predictions resulted in small to large increases in accuracy for low to moderately heritable traits. Depending on heritability of the trait, and on SNP and QTL densities, accuracy increased by up to 31 %. The advantage of WGS data was more pronounced (7 to 92 % increase in accuracy depending on trait heritability, SNP and QTL densities, and time of divergence between populations) with a combined reference population and when using MixP. While MixP outperformed SNP-BLUP at 45 QTL/Morgan, SNP-BLUP was as good as MixP when QTL density increased to 132 QTL/Morgan.ConclusionsOur results show that, genomic predictions in numerically small cattle populations would benefit from a combination of WGS data, a multi-breed reference population, and a variable selection method

    The importance of identity-by-state information for the accuracy of genomic selection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information.</p> <p>Methods</p> <p>The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data.</p> <p>Results</p> <p>We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree.</p> <p>Conclusions</p> <p>Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genome-wide breeding values when applied to other breeds, since family structures differ among breeds.</p

    Genetic prediction of complex traits: integrating infinitesimal and marked genetic effects

    Get PDF
    Genetic prediction for complex traits is usually based on models including individual (infinitesimal) or marker effects. Here, we concentrate on models including both the individual and the marker effects. In particular, we develop a ''Mendelian segregation'' model combining infinitesimal effects for base individuals and realized Mendelian sampling in descendants described by the available DNA data. The model is illustrated with an example and the analyses of a public simulated data file. Further, the potential contribution of such models is assessed by simulation. Accuracy, measured as the correlation between true (simulated) and predicted genetic values, was similar for all models compared under different genetic backgrounds. As expected, the segregation model is worthwhile when markers capture a low fraction of total genetic variance. (Résumé d'auteur

    Accuracy of breeding values of 'unrelated' individuals predicted by dense SNP genotyping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent developments in SNP discovery and high throughput genotyping technology have made the use of high-density SNP markers to predict breeding values feasible. This involves estimation of the SNP effects in a training data set, and use of these estimates to evaluate the breeding values of other 'evaluation' individuals. Simulation studies have shown that these predictions of breeding values can be accurate, when training and evaluation individuals are (closely) related. However, many general applications of genomic selection require the prediction of breeding values of 'unrelated' individuals, i.e. individuals from the same population, but not particularly closely related to the training individuals.</p> <p>Methods</p> <p>Accuracy of selection was investigated by computer simulation of small populations. Using scaling arguments, the results were extended to different populations, training data sets and genome sizes, and different trait heritabilities.</p> <p>Results</p> <p>Prediction of breeding values of unrelated individuals required a substantially higher marker density and number of training records than when prediction individuals were offspring of training individuals. However, when the number of records was 2*N<sub>e</sub>*L and the number of markers was 10*N<sub>e</sub>*L, the breeding values of unrelated individuals could be predicted with accuracies of 0.88 – 0.93, where N<sub>e </sub>is the effective population size and L the genome size in Morgan. Reducing this requirement to 1*N<sub>e</sub>*L individuals, reduced prediction accuracies to 0.73–0.83.</p> <p>Conclusion</p> <p>For livestock populations, 1N<sub>e</sub>L requires about ~30,000 training records, but this may be reduced if training and evaluation animals are related. A prediction equation is presented, that predicts accuracy when training and evaluation individuals are related. For humans, 1N<sub>e</sub>L requires ~350,000 individuals, which means that human disease risk prediction is possible only for diseases that are determined by a limited number of genes. Otherwise, genotyping and phenotypic recording need to become very common in the future.</p

    The complete linkage disequilibrium test: a test that points to causative mutations underlying quantitative traits

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetically, SNP that are in complete linkage disequilibrium with the causative SNP cannot be distinguished from the causative SNP. The Complete Linkage Disequilibrium (CLD) test presented here tests whether a SNP is in complete LD with the causative mutation or not. The performance of the CLD test is evaluated in 1000 simulated datasets.</p> <p>Methods</p> <p>The CLD test consists of two steps i.e. analysis I and analysis II. Analysis I consists of an association analysis of the investigated region. The log-likelihood values from analysis I are next ranked in descending order and in analysis II the CLD test evaluates differences in log-likelihood ratios between the best and second best markers. Under the null-hypothesis distribution, the best SNP is in greater LD with the QTL than the second best, while under the alternative-CLD-hypothesis, the best SNP is alike-in-state with the QTL. To find a significance threshold, the test was also performed on data excluding the causative SNP. The 5<sup>th</sup>, 10<sup>th </sup>and 50<sup>th </sup>highest T<sub>CLD </sub>value from 1000 replicated analyses were used to control the type-I-error rate of the test at p = 0.005, p = 0.01 and p = 0.05, respectively.</p> <p>Results</p> <p>In a situation where the QTL explained 48% of the phenotypic variance analysis I detected a QTL in 994 replicates (p = 0.001), where 972 were positioned in the correct QTL position. When the causative SNP was excluded from the analysis, 714 replicates detected evidence of a QTL (p = 0.001). In analysis II, the CLD test confirmed 280 causative SNP from 1000 simulations (p = 0.05), i.e. power was 28%. When the effect of the QTL was reduced by doubling the error variance, the power of the test reduced relatively little to 23%. When sequence data were used, the power of the test reduced to 16%. All SNP that were confirmed by the CLD test were positioned in the correct QTL position.</p> <p>Conclusions</p> <p>The CLD test can provide evidence for a causative SNP, but its power may be low in situations with closely linked markers. In such situations, also functional evidence will be needed to definitely conclude whether the SNP is causative or not.</p

    Best Linear Unbiased Prediction of Genomic Breeding Values Using a Trait-Specific Marker-Derived Relationship Matrix

    Get PDF
    With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest

    Strategies for implementing genomic selection in family-based aquaculture breeding schemes: double haploid sib test populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs.</p> <p>Methods</p> <p>Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (<it>Mat</it>), paternal (<it>Pat</it>) or a mixture of maternal and paternal (<it>MatPat</it>) double haploid genomes or test sibs were obtained by maximum coancestry mating (<it>MaxC</it>), minimum coancestry mating (<it>MinC</it>), or random (<it>RAND</it>) mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes.</p> <p>Results</p> <p>Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the <it>MatPat</it> scheme compared to the <it>RAND</it> scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. <it>Mat, Pat, MaxC</it>, and <it>MinC</it>, no substantial differences in selection accuracy and genetic gain were observed.</p> <p>Conclusions</p> <p>In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the <it>MatPat</it> scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the selection candidates and require the use of sib tests, such as disease resistance and meat quality.</p

    Persistence of accuracy of genomic estimated breeding values over generations in layer chickens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The predictive ability of genomic estimated breeding values (GEBV) originates both from associations between high-density markers and QTL (Quantitative Trait Loci) and from pedigree information. Thus, GEBV are expected to provide more persistent accuracy over successive generations than breeding values estimated using pedigree-based methods. The objective of this study was to evaluate the accuracy of GEBV in a closed population of layer chickens and to quantify their persistence over five successive generations using marker or pedigree information.</p> <p>Methods</p> <p>The training data consisted of 16 traits and 777 genotyped animals from two generations of a brown-egg layer breeding line, 295 of which had individual phenotype records, while others had phenotypes on 2,738 non-genotyped relatives, or similar data accumulated over up to five generations. Validation data included phenotyped and genotyped birds from five subsequent generations (on average 306 birds/generation). Birds were genotyped for 23,356 segregating SNP. Animal models using genomic or pedigree relationship matrices and Bayesian model averaging methods were used for training analyses. Accuracy was evaluated as the correlation between EBV and phenotype in validation divided by the square root of trait heritability.</p> <p>Results</p> <p>Pedigree relationships in outbred populations are reduced by 50% at each meiosis, therefore accuracy is expected to decrease by the square root of 0.5 every generation, as observed for pedigree-based EBV (Estimated Breeding Values). In contrast the GEBV accuracy was more persistent, although the drop in accuracy was substantial in the first generation. Traits that were considered to be influenced by fewer QTL and to have a higher heritability maintained a higher GEBV accuracy over generations. In conclusion, GEBV capture information beyond pedigree relationships, but retraining every generation is recommended for genomic selection in closed breeding populations.</p
    corecore