98 research outputs found
MetaGS: an accurate method to impute and combine SNP effects across populations using summary statistics
Background Meta-analysis describes a category of statistical methods that aim at combining the results of multiple studies to increase statistical power by exploiting summary statistics. Different industries that use genomic prediction do not share their raw data due to logistic or privacy restrictions, which can limit the size of their reference populations and creates a need for a practical meta-analysis method. Results We developed a meta-analysis, named MetaGS, that duplicates the results of multi-trait best linear unbiased prediction (mBLUP) analysis without accessing raw data. MetaGS exploits the correlations among different populations to produce more accurate population-specific single nucleotide polymorphism (SNP) effects. The method improves SNP effect estimations for a given population depending on its relations to other populations. MetaGS was tested on milk, fat and protein yield data of Australian Holstein and Jersey cattle and it generated very similar genomic estimated breeding values to those produced using the mBLUP method for all traits in both breeds. One of the major difficulties when combining SNP effects across populations is the use of different variants for the populations, which limits the applications of meta-analysis in practice. We solved this issue by developing a method to impute missing summary statistics without using raw data. Our results showed that imputing summary statistics can be done with high accuracy (r > 0.9) even when more than 70% of the SNPs were missing with a minimal effect on prediction accuracy. Conclusions We demonstrated that MetaGS can replace the mBLUP model when raw data cannot be shared, which can lead to more flexible collaborations compared to the single-trait BLUP model
Marker assisted estimation of breeding values when marker information is missing on many animals
International audienc
A practical approach for minimising inbreeding and maximising genetic gain in dairy cattle
A method that predicts the genetic composition and
inbreeding (F) of the future dairy cow population using information on the
current cow population, semen use and progeny test bulls is described. This
is combined with information on genetic merit of bulls to compare bull
selection methods that minimise F and maximise breeding value for profit
(called APR in Australia).
The genetic composition of the future cow population of Australian
Holstein-Friesian (HF) and Jersey up to 6 years into the future was
predicted. F in Australian HF and Jersey breeds is likely to increase by
about 0.002 and 0.003 per year between 2002 and 2008, respectively.
A comparison of bull selection methods showed that a method that selects the
best bull from all available bulls for each current or future cow, based on
its calf's APR minus F depression, is better than bull selection methods
based on APR alone, APR adjusted for mean F of prospective progeny after
random mating and mean APR adjusted for the relationship between the
selected bulls. This method reduced F of prospective progeny by about a
third to a half compared to the other methods when bulls are mated to
current and future cows that will be available 5 to 6 years from now. The
method also reduced the relationship between the bulls selected to nearly
the same extent as the method that is aimed at maximising genetic gain
adjusted for the relationship between bulls. The method achieves this
because cows with different pedigree exist in the population and the method
selects relatively unrelated bulls to mate to these different cows.
Selecting the best bull for each current or future cow so that the calf's
genetic merit minus F depression is maximised can slow the rate of increase
in F in the population
Using LASSO to estimate marker effects for genomic selection
Here we suggest a least absolute shrinkage and selection operator (LASSO) approach to estimate the marker effects for genomic selection using the least angle regression (LARS) algorithm, modified to include a cross–validation step to define the best subset of markers to involve
in the model. The LASSO-LARS was tested on simulated data which consisted of 5,865 individuals and 6,000 SNPs. The last generations of this dataset were the selection candidates. Using only animals from generations prior to the candidates, three approaches to splitting the population into training and validation sets for cross-validation were evaluated. Furthermore, different sizes of the validation sample were tested. Moreover, BLUP and Bayesian methods were carried out for comparison. The
most reliable cross-validation method was the random splitting of overall population with a validation
sample size of 50% of the reference population. The accuracy of the GEBVs (correlation with true
breeding values) in the candidate population obtained by LASSO-LARS was 0.89 with 156 explanatory SNPs. This value was higher then those obtained by using BLUP and Bayesian methods, which were 0.75 and 0.84 respectively. It was concluded that LASSO-LARS approach is a good alternative way to estimate markers effects for genomic selection
Accuracy of genomic breeding values in multi-breed dairy cattle populations
<p>Abstract</p> <p>Background</p> <p>Two key findings from genomic selection experiments are 1) the reference population used must be very large to subsequently predict accurate genomic estimated breeding values (GEBV), and 2) prediction equations derived in one breed do not predict accurate GEBV when applied to other breeds. Both findings are a problem for breeds where the number of individuals in the reference population is limited. A multi-breed reference population is a potential solution, and here we investigate the accuracies of GEBV in Holstein dairy cattle and Jersey dairy cattle when the reference population is single breed or multi-breed. The accuracies were obtained both as a function of elements of the inverse coefficient matrix and from the realised accuracies of GEBV.</p> <p>Methods</p> <p>Best linear unbiased prediction with a multi-breed genomic relationship matrix (GBLUP) and two Bayesian methods (BAYESA and BAYES_SSVS) which estimate individual SNP effects were used to predict GEBV for 400 and 77 young Holstein and Jersey bulls respectively, from a reference population of 781 and 287 Holstein and Jersey bulls, respectively. Genotypes of 39,048 SNP markers were used. Phenotypes in the reference population were de-regressed breeding values for production traits. For the GBLUP method, expected accuracies calculated from the diagonal of the inverse of coefficient matrix were compared to realised accuracies.</p> <p>Results</p> <p>When GBLUP was used, expected accuracies from a function of elements of the inverse coefficient matrix agreed reasonably well with realised accuracies calculated from the correlation between GEBV and EBV in single breed populations, but not in multi-breed populations. When the Bayesian methods were used, realised accuracies of GEBV were up to 13% higher when the multi-breed reference population was used than when a pure breed reference was used. However no consistent increase in accuracy across traits was obtained.</p> <p>Conclusion</p> <p>Predicting genomic breeding values using a genomic relationship matrix is an attractive approach to implement genomic selection as expected accuracies of GEBV can be readily derived. However in multi-breed populations, Bayesian approaches give higher accuracies for some traits. Finally, multi-breed reference populations will be a valuable resource to fine map QTL.</p
The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment
The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in
operation since July 2014. This paper describes the second data release from
this phase, and the fourteenth from SDSS overall (making this, Data Release
Fourteen or DR14). This release makes public data taken by SDSS-IV in its first
two years of operation (July 2014-2016). Like all previous SDSS releases, DR14
is cumulative, including the most recent reductions and calibrations of all
data taken by SDSS since the first phase began operations in 2000. New in DR14
is the first public release of data from the extended Baryon Oscillation
Spectroscopic Survey (eBOSS); the first data from the second phase of the
Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2),
including stellar parameter estimates from an innovative data driven machine
learning algorithm known as "The Cannon"; and almost twice as many data cubes
from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous
release (N = 2812 in total). This paper describes the location and format of
the publicly available data from SDSS-IV surveys. We provide references to the
important technical papers describing how these data have been taken (both
targeting and observation details) and processed for scientific use. The SDSS
website (www.sdss.org) has been updated for this release, and provides links to
data downloads, as well as tutorials and examples of data use. SDSS-IV is
planning to continue to collect astronomical data until 2020, and will be
followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14
happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov
2017 (this is the "post-print" and "post-proofs" version; minor corrections
only from v1, and most of errors found in proofs corrected
- …