115 research outputs found

    MetaGS: an accurate method to impute and combine SNP effects across populations using summary statistics

    Get PDF
    Background Meta-analysis describes a category of statistical methods that aim at combining the results of multiple studies to increase statistical power by exploiting summary statistics. Different industries that use genomic prediction do not share their raw data due to logistic or privacy restrictions, which can limit the size of their reference populations and creates a need for a practical meta-analysis method. Results We developed a meta-analysis, named MetaGS, that duplicates the results of multi-trait best linear unbiased prediction (mBLUP) analysis without accessing raw data. MetaGS exploits the correlations among different populations to produce more accurate population-specific single nucleotide polymorphism (SNP) effects. The method improves SNP effect estimations for a given population depending on its relations to other populations. MetaGS was tested on milk, fat and protein yield data of Australian Holstein and Jersey cattle and it generated very similar genomic estimated breeding values to those produced using the mBLUP method for all traits in both breeds. One of the major difficulties when combining SNP effects across populations is the use of different variants for the populations, which limits the applications of meta-analysis in practice. We solved this issue by developing a method to impute missing summary statistics without using raw data. Our results showed that imputing summary statistics can be done with high accuracy (r > 0.9) even when more than 70% of the SNPs were missing with a minimal effect on prediction accuracy. Conclusions We demonstrated that MetaGS can replace the mBLUP model when raw data cannot be shared, which can lead to more flexible collaborations compared to the single-trait BLUP model

    The distribution of the effects of genes affecting quantitative traits in livestock

    Get PDF
    Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1σp, will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait

    A practical approach for minimising inbreeding and maximising genetic gain in dairy cattle

    Get PDF
    A method that predicts the genetic composition and inbreeding (F) of the future dairy cow population using information on the current cow population, semen use and progeny test bulls is described. This is combined with information on genetic merit of bulls to compare bull selection methods that minimise F and maximise breeding value for profit (called APR in Australia). The genetic composition of the future cow population of Australian Holstein-Friesian (HF) and Jersey up to 6 years into the future was predicted. F in Australian HF and Jersey breeds is likely to increase by about 0.002 and 0.003 per year between 2002 and 2008, respectively. A comparison of bull selection methods showed that a method that selects the best bull from all available bulls for each current or future cow, based on its calf's APR minus F depression, is better than bull selection methods based on APR alone, APR adjusted for mean F of prospective progeny after random mating and mean APR adjusted for the relationship between the selected bulls. This method reduced F of prospective progeny by about a third to a half compared to the other methods when bulls are mated to current and future cows that will be available 5 to 6 years from now. The method also reduced the relationship between the bulls selected to nearly the same extent as the method that is aimed at maximising genetic gain adjusted for the relationship between bulls. The method achieves this because cows with different pedigree exist in the population and the method selects relatively unrelated bulls to mate to these different cows. Selecting the best bull for each current or future cow so that the calf's genetic merit minus F depression is maximised can slow the rate of increase in F in the population

    Using LASSO to estimate marker effects for genomic selection

    Get PDF
    Here we suggest a least absolute shrinkage and selection operator (LASSO) approach to estimate the marker effects for genomic selection using the least angle regression (LARS) algorithm, modified to include a cross–validation step to define the best subset of markers to involve in the model. The LASSO-LARS was tested on simulated data which consisted of 5,865 individuals and 6,000 SNPs. The last generations of this dataset were the selection candidates. Using only animals from generations prior to the candidates, three approaches to splitting the population into training and validation sets for cross-validation were evaluated. Furthermore, different sizes of the validation sample were tested. Moreover, BLUP and Bayesian methods were carried out for comparison. The most reliable cross-validation method was the random splitting of overall population with a validation sample size of 50% of the reference population. The accuracy of the GEBVs (correlation with true breeding values) in the candidate population obtained by LASSO-LARS was 0.89 with 156 explanatory SNPs. This value was higher then those obtained by using BLUP and Bayesian methods, which were 0.75 and 0.84 respectively. It was concluded that LASSO-LARS approach is a good alternative way to estimate markers effects for genomic selection

    Imputation of Missing Genotypes from Sparse to High Density Using Long-Range Phasing

    Get PDF
    Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible
    corecore