331 research outputs found

    Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep

    Get PDF
    International audienceAbstractBackgroundThis study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel.ResultsThe accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS -log10(pvalue)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}βˆ’log10(p value)- log_{10} (p\,value)\end{document} threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS -log10(pvalue)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}βˆ’log10(p value)- log_{10} (p\,value)\end{document} threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01).ConclusionsOur results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep

    Using the Pareto principle in genome-wide breeding value estimation

    Get PDF
    Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis

    Scans for signatures of selection in Russian cattle breed genomes reveal new candidate genes for environmental adaptation and acclimation

    Get PDF
    Domestication and selective breeding has resulted in over 1000 extant cattle breeds. Many of these breeds do not excel in important traits but are adapted to local environments. These adaptations are a valuable source of genetic material for efforts to improve commercial breeds. As a step toward this goal we identified candidate regions to be under selection in genomes of nine Russian native cattle breeds adapted to survive in harsh climates. After comparing our data to other breeds of European and Asian origins we found known and novel candidate genes that could potentially be related to domestication, economically important traits and environmental adaptations in cattle. The Russian cattle breed genomes contained regions under putative selection with genes that may be related to adaptations to harsh environments (e.g., AQP5, RAD50, and RETREG1). We found genomic signatures of selective sweeps near key genes related to economically important traits, such as the milk production (e.g., DGAT1, ABCG2), growth (e.g., XKR4), and reproduction (e.g., CSF2). Our data point to candidate genes which should be included in future studies attempting to identify genes to improve the extant breeds and facilitate generation of commercial breeds that fit better into the environments of Russia and other countries with similar climates

    Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    Get PDF
    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

    Accuracy of genomic BLUP when considering a genomic relationship matrix based on the number of the largest eigenvalues: a simulation study.

    Get PDF
    International audienceAbstractBackgroundThe dimensionality of genomic information is limited by the number of independent chromosome segments (Me), which is a function of the effective population size. This dimensionality can be determined approximately by singular value decomposition of the gene content matrix, by eigenvalue decomposition of the genomic relationship matrix (GRM), or by the number of core animals in the algorithm for proven and young (APY) that maximizes the accuracy of genomic prediction. In the latter, core animals act as proxies to linear combinations of Me. Field studies indicate that a moderate accuracy of genomic selection is achieved with a small dataset, but that further improvement of the accuracy requires much more data. When only one quarter of the optimal number of core animals are used in the APY algorithm, the accuracy of genomic selection is only slightly below the optimal value. This suggests that genomic selection works on clusters of Me.ResultsThe simulation included datasets with different population sizes and amounts of phenotypic information. Computations were done by genomic best linear unbiased prediction (GBLUP) with selected eigenvalues and corresponding eigenvectors of the GRM set to zero. About four eigenvalues in the GRM explained 10% of the genomic variation, and less than 2% of the total eigenvalues explained 50% of the genomic variation. With limited phenotypic information, the accuracy of GBLUP was close to the peak where most of the smallest eigenvalues were set to zero. With a large amount of phenotypic information, accuracy increased as smaller eigenvalues were added.ConclusionsA small amount of phenotypic data is sufficient to estimate only the effects of the largest eigenvalues and the associated eigenvectors that contain a large fraction of the genomic information, and a very large amount of data is required to estimate the remaining eigenvalues that account for a limited amount of genomic information. Core animals in the APY algorithm act as proxies of almost the same number of eigenvalues. By using an eigenvalues-based approach, it was possible to explain why the moderate accuracy of genomic selection based on small datasets only increases slowly as more data are added

    Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy

    Get PDF
    Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations

    Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle

    Get PDF
    Background Domestication of the now-extinct wild aurochs, Bos primigenius, gave rise to the two major domestic extant cattle taxa, B. taurus and B. indicus. While previous genetic studies have shed some light on the evolutionary relationships between European aurochs and modern cattle, important questions remain unanswered, including the phylogenetic status of aurochs, whether gene flow from aurochs into early domestic populations occurred, and which genomic regions were subject to selection processes during and after domestication. Here, we address these questions using whole-genome sequencing data generated from an approximately 6,750-year-old British aurochs bone and genome sequence data from 81 additional cattle plus genome-wide single nucleotide polymorphism data from a diverse panel of 1,225 modern animals. Results Phylogenomic analyses place the aurochs as a distinct outgroup to the domestic B. taurus lineage, supporting the predominant Near Eastern origin of European cattle. Conversely, traditional British and Irish breeds share more genetic variants with this aurochs specimen than other European populations, supporting localized gene flow from aurochs into the ancestors of modern British and Irish cattle, perhaps through purposeful restocking by early herders in Britain. Finally, the functions of genes showing evidence for positive selection in B. taurus are enriched for neurobiology, growth, metabolism and immunobiology, suggesting that these biological processes have been important in the domestication of cattle. Conclusions This work provides important new information regarding the origins and functional evolution of modern cattle, revealing that the interface between early European domestic populations and wild aurochs was significantly more complex than previously thought

    Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep

    Get PDF
    Background: This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel. Results: The accuracy of genomic prediction improved marginally from 0.16 Β± 0.02 and 0.18 Β± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 Β± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS -log10(p value) threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 Β± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS βˆ’log10(p value) threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 Β± 0.01). Conclusions: Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep

    Rapid genotype imputation from sequence without reference panels

    Get PDF
    Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r(2) value of 0.98 in 2,073 outbred laboratory mice (0.15Γ— sequencing coverage). In a sample of 11,670 Han Chinese (1.7Γ— coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans

    Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels

    Get PDF
    International audienceBackground Currently, genomic prediction in cattle is largely based on panels of about 54k single nucleotide polymorphisms (SNPs). However with the decreasing costs of and current advances in next-generation sequencing technologies, whole-genome sequence (WGS) data on large numbers of individuals is within reach. Availability of such data provides new opportunities for genomic selection, which need to be explored.MethodsThis simulation study investigated how much predictive ability is gained by using WGS data under scenarios with QTL (quantitative trait loci) densities ranging from 45 to 132 QTL/Morgan and heritabilities ranging from 0.07 to 0.30, compared to different SNP densities, with emphasis on divergent dairy cattle breeds with small populations. The relative performances of best linear unbiased prediction (SNP-BLUP) and of a variable selection method with a mixture of two normal distributions (MixP) were also evaluated. Genomic predictions were based on within-population, across-population, and multi-breed reference populations.ResultsThe use of WGS data for within-population predictions resulted in small to large increases in accuracy for low to moderately heritable traits. Depending on heritability of the trait, and on SNP and QTL densities, accuracy increased by up to 31Β %. The advantage of WGS data was more pronounced (7 to 92Β % increase in accuracy depending on trait heritability, SNP and QTL densities, and time of divergence between populations) with a combined reference population and when using MixP. While MixP outperformed SNP-BLUP at 45 QTL/Morgan, SNP-BLUP was as good as MixP when QTL density increased to 132 QTL/Morgan.ConclusionsOur results show that, genomic predictions in numerically small cattle populations would benefit from a combination of WGS data, a multi-breed reference population, and a variable selection method
    • …
    corecore