Search CORE

Adelaide Research & Scholarship

Open Access Research from University of Wollongong

CIMMYT Publications Repository

Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

Author: AJ Chamberlain
APW de Roos
BJ Hayes
BJ Hayes
BJ Hayes
BL Browning
C Dimauro
C Hagger
Corrado Dimauro
D Boichard
D Segelke
DP Berry
G Li
G Moser
Gabriele Marras
GCB Schopen
Giustino Gaspa
H Abdi
HA Mulder
HD Daetwyler
I Medugorac
J Chen
JE Pryce
JM Hickey
K Kizilkaya
KA Weigel
KA Weigel
Massimo Cellesi
Nicolò PP Macciotta
P Ajmone-Marsan
P Scheet
Paolo Ajmone-Marsan
PM VanRaden
R Dassonneville
R Dassonneville
Roberto Steri
T Druet
T Druet
TH Meuwissen
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

CiteSeerX

PubliCatt

CINECA IRIS Institutial research information system UNISS

UnissResearch

Fitting and validating the genomic evaluation model to Polish Holstein-Friesian cattle

Author: A Legarra
A Żarnecki
B Grisart
BJ Hayes
D Habier
E Mäntysaari
I Strandén
L Jairath
MPL Calus
MS Lund
PM VanRaden
PM VanRaden
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

The aim of the study was to fit the genomic evaluation model to Polish Holstein-Friesian dairy cattle. A training data set for the estimation of additive effects of single nucleotide polymorphisms (SNPs) consisted of 1227 Polish Holstein-Friesian bulls. Genotypes were obtained by the use of Illumina BovineSNP50 Genotyping BeadChip. Altogether 29 traits were considered: milk-, fat- and protein- yields, somatic cell score, four female fertility traits, and 21 traits describing conformation. The prediction of direct genomic values was based on a mixed model containing deregressed national proofs as a dependent variable and random SNP effects as independent variables. The correlations between direct genomic values and conventional estimated breeding values estimated for the whole data set were overall very high and varied between 0.98 for production traits and 0.78 for non return rates for cows. For the validation data set of 232 bulls the corresponding correlations were 0.38 for milk-, 0.37 for protein-, and 0.32 for fat yields, while the correlations between genomic enhanced breeding values and conventional estimated breeding values for the four traits were: 0.43, 0.44, 0.31, and 0.35. This model was able to pass the interbull validation criteria for genomic selection, which indicates that it is realistic to implement genomic selection in Polish Holstein-Friesian cattle

The importance of identity-by-state information for the accuracy of genomic selection

Author: Alessandro Bagnato
AR Gilmour
BJ Hayes
BL Harris
D Berry
D Habier
D Habier
DJ Garrick
HD Daetwyler
John A Woolliams
Jørgen Ødegård
M Goddard
Marlies Dolezal
ME Goddard
MS Lund
PM VanRaden
R Makowsky
RL Fernando
Sergio I Roman-Ponce
T Luan
T Meuwissen
THE Meuwissen
THE Meuwissen
Theo HE Meuwissen
Tu Luan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract Background It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information. Methods The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data. Results We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree. Conclusions Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genome-wide breeding values when applied to other breeds, since family structures differ among breeds.</p

AIR Universita degli studi di Milano

Strategies for implementing genomic selection in family-based aquaculture breeding schemes: double haploid sib test populations

Author: AK Sonesson
AK Sonesson
Anna K Sonesson
B Villanueva
BJ Hayes
D Habier
EL Heffner
F Galton
H Komen
HD Daetwyler
HM Nielsen
JBS Haldane
JL Jannink
John A Woolliams
JP Gibson
Kahsay G Nirea
KG Nirea
M Goddard
M Kimura
M Pszczola
ME Goddard
MG Bulmer
PM VanRaden
S Wright
THE Meuwissen
Theo HE Meuwissen
VA Martinez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Abstract Background Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs. Methods Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (<it>Mat</it>), paternal (<it>Pat</it>) or a mixture of maternal and paternal (<it>MatPat</it>) double haploid genomes or test sibs were obtained by maximum coancestry mating (<it>MaxC</it>), minimum coancestry mating (<it>MinC</it>), or random (<it>RAND</it>) mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes. Results Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the <it>MatPat</it> scheme compared to the <it>RAND</it> scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. <it>Mat, Pat, MaxC</it>, and <it>MinC</it>, no substantial differences in selection accuracy and genetic gain were observed. Conclusions In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the <it>MatPat</it> scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the selection candidates and require the use of sib tests, such as disease resistance and meat quality.</p

Estimating genomic breeding values and detecting QTL using univariate and bivariate models

Author: CS Haley
Han A Mulder
JI Weller
KL Verbyla
Mario PL Calus
MPL Calus
P VanRaden
PM VanRaden
Roel F Veerkamp
THE Meuwissen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Genomic selection is particularly beneficial for difficult or expensive to measure traits. Since multi-trait selection is an important tool to deal with such cases, an important question is what the added value is of multi-trait genomic selection. Methods The simulated dataset, including a quantitative and binary trait, was analyzed with four univariate and bivariate linear models to predict breeding values for juvenile animals. Two models estimated variance components with REML using a numerator (A), or SNP based relationship matrix (G). Two SNP based Bayesian models included one (BayesA) or two distributions (BayesC) for estimated SNP effects. The bivariate BayesC model sampled QTL probabilities for each SNP conditional on both traits. Genotypes were permuted 2,000 times against phenotypes and pedigree, to obtain significance thresholds for posterior QTL probabilities. Genotypes were permuted rather than phenotypes, to retain relationships between pedigree and phenotypes, such that polygenic effects could still be estimated. Results Correlations between estimated breeding values (EBV) of different SNP based models, for juvenile animals, were greater than 0.93 (0.87) for the quantitative (binary) trait. Estimated genetic correlation was 0.71 (0.66) for model G (A). Accuracies of breeding values of SNP based models were for both traits highest for BayesC and lowest for G. Accuracies of breeding values of bivariate models were up to 0.08 higher than for univariate models. The bivariate BayesC model detected 14 out of 32 QTL for the quantitative trait, and 8 out of 22 for the binary trait. Conclusions Accuracy of EBV clearly improved for both traits using bivariate compared to univariate models. BayesC achieved highest accuracies of EBV and was also one of the methods that found most QTL. Permuting genotypes against phenotypes and pedigree in BayesC provided an effective way to derive significance thresholds for posterior QTL probabilitie

Wageningen University & Research Publications

Regional variation in health is predominantly driven by lifestyle rather than genetics

Author: A Jemal
AD Lopez
BA Swinburn
BH Smith
C Amador
C Chang
C Willyard
C Xia
G Davey Smith
J Yang
J Yang
M Ezzati
M Marmot
MR Robinson
N Zaitlen
NY Krakauer
PM VanRaden
S Vattikuti
The 1000 Genomes Project Consortium
YC Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2017
Field of study

Health-related traits are known to vary geographically. Here, Amador and colleagues show that regional variation of obesity-related traits in a Scottish population is influenced more by lifestyle differences than it is by genetic differences

Discovery Research Portal

A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes

Author: A Kong
A Kong
BN Howie
BP Kinghorn
Brian P Kinghorn
Bruce Tier
D Habier
GK Chen
HD Daetwyler
James F Wilson
JM Hickey
John M Hickey
Julius HJ van der Werf
KA Weigel
Neil Dunstan
P Scheet
PM VanRaden
R McQuillan
R Villa-Angulo
S MacEachern
SR Browning
Y Li
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Abstract Background Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data. Methods A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information. Results The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available. Conclusions The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.</p

Research UNE

Genomic evaluations with many more genotypes

Author: A Flaquer
A Toosi
B Harris
C Henderson
D Habier
G Wiggans
G Wiggans
George R Wiggans
J Burdick
J Cole
J Cole
J Taylor
J Yang
Jeffrey R O'Connell
K Weigel
K Weigel
KA Weigel
Kent A Weigel
M Calus
M Lund
M Sargolzaei
N Macciotta
P VanRaden
P VanRaden
P VanRaden
P VanRaden
P Vanraden
Paul M VanRaden
PM VanRaden
R Villa-Angulo
T Druet
T Meuwissen
T Solberg
T Villumsen
Y Li
Z Liu
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. Methods Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth) of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. Results Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4%) with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50,000 markers and 1.6% lower with 500,000 markers. Conclusions Methods to impute genotypes and compute genomic evaluations were affordable with many more markers. Reliabilities for individual animals can be modified to reflect success of imputation. Breeders can improve reliability at lower cost by combining marker densities to increase both the numbers of markers and animals included in genomic evaluation. Larger gains are expected from increasing the number of animals than the number of markers.</p

Large-scale genomic prediction using singular value decomposition of the genotype matrix

Author: A Legarra
CR Henderson
DC Lay
G Campos de los
I Misztal
I Misztal
Ismo Strandén
Jørgen Ødegård
L Tusell
M Kimura
OF Christensen
P VanRaden
PM VanRaden
PM VanRaden
RL Fernando
T Hastie
T Meuwissen
T Meuwissen
THE Meuwissen
THE Meuwissen
Theo H. E. Meuwissen
Ulf Indahl
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study