65 research outputs found

    Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

    Get PDF
    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with more computing time. Nevertheless, the differences diminished when \u3e5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with \u3e3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal

    Genomic evaluations with many more genotypes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly.</p> <p>Methods</p> <p>Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth) of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared.</p> <p>Results</p> <p>Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4%) with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50,000 markers and 1.6% lower with 500,000 markers.</p> <p>Conclusions</p> <p>Methods to impute genotypes and compute genomic evaluations were affordable with many more markers. Reliabilities for individual animals can be modified to reflect success of imputation. Breeders can improve reliability at lower cost by combining marker densities to increase both the numbers of markers and animals included in genomic evaluation. Larger gains are expected from increasing the number of animals than the number of markers.</p

    Design of a Bovine Low-Density SNP Array Optimized for Imputation

    Get PDF
    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle

    Imputation of Missing Genotypes from Sparse to High Density Using Long-Range Phasing

    Get PDF
    Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible

    Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array

    Get PDF
    Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases. In this study using the high density BovineHD SNP array, we performed high resolution CNV analyses on both Btau_4.0 and UMD3.1 with 674 animals of 27 cattle breeds. We first compared CNV results derived from these two different SNP array platforms on Btau_4.0. With two thirds of the animals shared between studies, on Btau_4.0 we identified 3,346 candidate CNV regions representing 142.7 megabases (~4.70%) of the genome. With a similar total length but 5 times more event counts, the average CNVR length of current Btau_4.0 dataset is significantly shorter than the previous one (42.7 kb vs. 205 kb). Although subsets of these two results overlapped, 64% (91.6 megabases) of current dataset was not present in the previous study. We also performed similar analyses on UMD3.1 using these BovineHD SNP array results. Approximately 50% more and 20% longer CNVs were called on UMD3.1 as compared to those on Btau_4.0. However, a comparable result of CNVRs (3,438 regions with a total length 146.9 megabases) was obtained. We suspect that these results are due to the UMD3.1 assembly's efforts of placing unplaced contigs and removing unmerged alleles. Selected CNVs were further experimentally validated, achieving a 73% PCR validation rate, which is considerably higher than the previous validation rate. About 20-45% of CNV regions overlapped with cattle RefSeq genes and Ensembl genes. Panther and IPA analyses indicated that these genes provide a wide spectrum of biological processes involving immune system, lipid metabolism, cell, organism and system development. In this study using the high density BovineHD SNP array, we performed high resolution CNV analyses on both Btau_4.0 and UMD3.1 with 674 animals of 27 cattle breeds. We first compared CNV results derived from these two different SNP array platforms on Btau_4.0. With two thirds of the animals shared between studies, on Btau_4.0 we identified 3,346 candidate CNV regions representing 142.7 megabases (~4.70%) of the genome. With a similar total length but 5 times more event counts, the average CNVR length of current Btau_4.0 dataset is significantly shorter than the previous one (42.7 kb vs. 205 kb). Although subsets of these two results overlapped, 64% (91.6 megabases) of current dataset was not present in the previous study. We also performed similar analyses on UMD3.1 using these BovineHD SNP array results. Approximately 50% more and 20% longer CNVs were called on UMD3.1 as compared to those on Btau_4.0. However, a comparable result of CNVRs (3,438 regions with a total length 146.9 megabases) was obtained. We suspect that these results are due to the UMD3.1 assembly's efforts of placing unplaced contigs and removing unmerged alleles. Selected CNVs were further experimentally validated, achieving a 73% PCR validation rate, which is considerably higher than the previous validation rate. About 20-45% of CNV regions overlapped with cattle RefSeq genes and Ensembl genes. Panther and IPA analyses indicated that these genes provide a wide spectrum of biological processes involving immune system, lipid metabolism, cell, organism and system development. We present a comprehensive result of cattle CNVs at a higher resolution and sensitivity. We identified over 3,000 candidate CNV regions on both Btau_4.0 and UMD3.1, further compared current datasets with previous results, and examined the impacts of genome assemblies on CNV calling.https://doi.org/10.1186/1471-2164-13-37

    Genome-wide association analysis of thirty one production, health, reproduction and body conformation traits in contemporary U.S. Holstein cows

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association analysis is a powerful tool for annotating phenotypic effects on the genome and knowledge of genes and chromosomal regions associated with dairy phenotypes is useful for genome and gene-based selection. Here, we report results of a genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows.</p> <p>Results</p> <p>Genome-wide association analysis identified a number of candidate genes and chromosome regions associated with 31 dairy traits in contemporary U.S. Holstein cows. Highly significant genes and chromosome regions include: BTA13's <it>GNAS </it>region for milk, fat and protein yields; BTA7's <it>INSR </it>region and BTAX's <it>LOC520057 </it>and <it>GRIA3 </it>for daughter pregnancy rate, somatic cell score and productive life; BTA2's <it>LRP1B </it>for somatic cell score; BTA14's <it>DGAT1-NIBP </it>region for fat percentage; <it>BTA1</it>'s <it>FKBP2 </it>for protein yields and percentage, BTA26's <it>MGMT </it>and BTA6's <it>PDGFRA </it>for protein percentage; BTA18's 53.9-58.7 Mb region for service-sire and daughter calving ease and service-sire stillbirth; BTA18's <it>PGLYRP1</it>-<it>IGFL1 </it>region for a large number of traits; BTA18's <it>LOC787057 </it>for service-sire stillbirth and daughter calving ease; BTA15's <it>CD82</it>, BTA23's <it>DST </it>and the <it>MOCS1</it>-<it>LRFN2 </it>region for daughter stillbirth; and BTAX's <it>LOC520057 </it>and <it>GRIA3 </it>for daughter pregnancy rate. For body conformation traits, BTA11, BTAX, BTA10, BTA5, and BTA26 had the largest concentrations of SNP effects, and <it>PHKA2 </it>of BTAX and <it>REN </it>of BTA16 had the most significant effects for body size traits. For body shape traits, BTAX, BTA19 and BTA3 were most significant. Udder traits were affected by BTA16, BTA22, BTAX, BTA2, BTA10, BTA11, BTA20, BTA22 and BTA25, teat traits were affected by BTA6, BTA7, BTA9, BTA16, BTA11, BTA26 and BTA17, and feet/legs traits were affected by BTA11, BTA13, BTA18, BTA20, and BTA26.</p> <p>Conclusions</p> <p>Genome-wide association analysis identified a number of genes and chromosome regions associated with 31 production, health, reproduction and body conformation traits in contemporary Holstein cows. The results provide useful information for annotating phenotypic effects on the dairy genome and for building consensus of dairy QTL effects.</p

    Updating test-day milk yield factors for use in genetic evaluations and dairy production systems: a comprehensive review

    Get PDF
    Various methods have been proposed to estimate daily yield from partial yields, primarily to deal with unequal milking intervals. This paper offers an exhaustive review of daily milk yields, the foundation of lactation records. Seminal advancements in the late 20th century concentrated on two main adjustment metrics: additive additive correction factors (ACF) and multiplicative correction factors (MCF). An ACF model provides additive adjustments to two times AM or PM milk yield, which then becomes the estimated daily yields, whereas an MCF is a ratio of daily yield to the yield from a single milking. Recent studies highlight the potential of alternative approaches, such as exponential regression and other nonlinear models. Biologically, milk secretion rates are not linear throughout the entire milking interval, influenced by the internal mammary gland pressure. Consequently, nonlinear models are appealing for estimating daily milk yields as well. MCFs and ACFs are typically determined for discrete milking interval classes. Nonetheless, large discrete intervals can introduce systematic biases. A universal solution for deriving continuous correction factors has been proposed, ensuring reduced bias and enhanced daily milk yield estimation accuracy. When leveraging test-day milk yields for genetic evaluations in dairy cattle, two predominant statistical models are employed: lactation and test-day yield models. A lactation model capitalizes on the high heritability of total lactation yields, aligning closely with dairy producers’ needs because the total amount of milk production in a lactation directly determines farm revenue. However, a lactation yield model without harnessing all test-day records may ignore vital data about the shapes of lactation curves needed for informed breeding decisions. In contrast, a test-day model emphasizes individual test-day data, accommodating various intervals and recording plans and allowing the estimation of environmental effects on specific test days. In the United States, the patenting of test-day models in 1993 used to restrict the use of test-day models to regional and unofficial evaluations by the patent holders. Estimated test-day milk yields have been used as if they were accurate depictions of actual milk yields, neglecting possible estimation errors. Its potential consequences on subsequent genetic evaluations have not been sufficiently addressed. Moving forward, there are still numerous questions and challenges in this domain

    Genomic characteristics of cattle copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.</p> <p>Results</p> <p>We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.</p> <p>Conclusions</p> <p>We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.</p

    Anglo-Dutch Premium Auctions in Eighteenth-Century Amsterdam

    Full text link

    Genetic selection: Evaluation and methods

    Full text link
    peer reviewedThe ultimate goal of animal selection is to create a new generation of animals that are superior to the current population. Superior is interpreted broadly to include functionality of animals, cost reduction of production, consumer perception, quality of products, and reduced environmental impact. These factors contribute to overall sustainability and long-term economic profitability of animal production. An essential element of selection is a genetic evaluation system for the detection of superior animals to be used to produce future generations. Current genetic evaluations use phenotypic records and advanced statistical methods to separate genetic and environmental effects. These traditional methods are complemented by DNA-based technologies that provide genetic information at a molecular level. Genetic evaluation systems are highly complex and involve collection of data from thousands of farms, determination of milk characteristics in laboratories, processing and storage of data in regional computing centers, and application of advanced statistical procedures to estimate genetic merit. Genetic evaluations are widely distributed and are the primary determiner of the value of semen and embryos. Internationally, bull evaluations are combined across countries so that each country has a single national ranking of all bulls worldwide. Selection decisions on farms and by artificial insemination organizations are highly dependent on that genetic information. This article covers aspects of genetic selection that stretch from basic data collection (including identification systems), traits recorded and evaluated, and characteristics of current and future evaluation systems to new DNA-based technologies
    corecore