83 research outputs found
Using the Pareto principle in genome-wide breeding value estimation
Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis
The importance of identity-by-state information for the accuracy of genomic selection
<p>Abstract</p> <p>Background</p> <p>It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information.</p> <p>Methods</p> <p>The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identity-by-descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data.</p> <p>Results</p> <p>We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree.</p> <p>Conclusions</p> <p>Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identity-by-state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genome-wide breeding values when applied to other breeds, since family structures differ among breeds.</p
Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers
Background: At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods. Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results: RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions: Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ∼ 3,000 to 5,000 evenly spaced SNP
Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model
<p>Abstract</p> <p>Background</p> <p>Genomic selection involves breeding value estimation of selection candidates based on high-density SNP genotypes. To quantify the potential benefit of genomic selection, accuracies of estimated breeding values (EBV) obtained with different methods using pedigree or high-density SNP genotypes were evaluated and compared in a commercial layer chicken breeding line.</p> <p>Methods</p> <p>The following traits were analyzed: egg production, egg weight, egg color, shell strength, age at sexual maturity, body weight, albumen height, and yolk weight. Predictions appropriate for early or late selection were compared. A total of 2,708 birds were genotyped for 23,356 segregating SNP, including 1,563 females with records. Phenotypes on relatives without genotypes were incorporated in the analysis (in total 13,049 production records).</p> <p>The data were analyzed with a Reduced Animal Model using a relationship matrix based on pedigree data or on marker genotypes and with a Bayesian method using model averaging. Using a validation set that consisted of individuals from the generation following training, these methods were compared by correlating EBV with phenotypes corrected for fixed effects, selecting the top 30 individuals based on EBV and evaluating their mean phenotype, and by regressing phenotypes on EBV.</p> <p>Results</p> <p>Using high-density SNP genotypes increased accuracies of EBV up to two-fold for selection at an early age and by up to 88% for selection at a later age. Accuracy increases at an early age can be mostly attributed to improved estimates of parental EBV for shell quality and egg production, while for other egg quality traits it is mostly due to improved estimates of Mendelian sampling effects. A relatively small number of markers was sufficient to explain most of the genetic variation for egg weight and body weight.</p
PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
<p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.</p> <p>Results</p> <p>Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.</p> <p>Conclusions</p> <p>This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p
Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels
International audienceBackground Currently, genomic prediction in cattle is largely based on panels of about 54k single nucleotide polymorphisms (SNPs). However with the decreasing costs of and current advances in next-generation sequencing technologies, whole-genome sequence (WGS) data on large numbers of individuals is within reach. Availability of such data provides new opportunities for genomic selection, which need to be explored.MethodsThis simulation study investigated how much predictive ability is gained by using WGS data under scenarios with QTL (quantitative trait loci) densities ranging from 45 to 132 QTL/Morgan and heritabilities ranging from 0.07 to 0.30, compared to different SNP densities, with emphasis on divergent dairy cattle breeds with small populations. The relative performances of best linear unbiased prediction (SNP-BLUP) and of a variable selection method with a mixture of two normal distributions (MixP) were also evaluated. Genomic predictions were based on within-population, across-population, and multi-breed reference populations.ResultsThe use of WGS data for within-population predictions resulted in small to large increases in accuracy for low to moderately heritable traits. Depending on heritability of the trait, and on SNP and QTL densities, accuracy increased by up to 31 %. The advantage of WGS data was more pronounced (7 to 92 % increase in accuracy depending on trait heritability, SNP and QTL densities, and time of divergence between populations) with a combined reference population and when using MixP. While MixP outperformed SNP-BLUP at 45 QTL/Morgan, SNP-BLUP was as good as MixP when QTL density increased to 132 QTL/Morgan.ConclusionsOur results show that, genomic predictions in numerically small cattle populations would benefit from a combination of WGS data, a multi-breed reference population, and a variable selection method
Persistence of accuracy of genomic estimated breeding values over generations in layer chickens
<p>Abstract</p> <p>Background</p> <p>The predictive ability of genomic estimated breeding values (GEBV) originates both from associations between high-density markers and QTL (Quantitative Trait Loci) and from pedigree information. Thus, GEBV are expected to provide more persistent accuracy over successive generations than breeding values estimated using pedigree-based methods. The objective of this study was to evaluate the accuracy of GEBV in a closed population of layer chickens and to quantify their persistence over five successive generations using marker or pedigree information.</p> <p>Methods</p> <p>The training data consisted of 16 traits and 777 genotyped animals from two generations of a brown-egg layer breeding line, 295 of which had individual phenotype records, while others had phenotypes on 2,738 non-genotyped relatives, or similar data accumulated over up to five generations. Validation data included phenotyped and genotyped birds from five subsequent generations (on average 306 birds/generation). Birds were genotyped for 23,356 segregating SNP. Animal models using genomic or pedigree relationship matrices and Bayesian model averaging methods were used for training analyses. Accuracy was evaluated as the correlation between EBV and phenotype in validation divided by the square root of trait heritability.</p> <p>Results</p> <p>Pedigree relationships in outbred populations are reduced by 50% at each meiosis, therefore accuracy is expected to decrease by the square root of 0.5 every generation, as observed for pedigree-based EBV (Estimated Breeding Values). In contrast the GEBV accuracy was more persistent, although the drop in accuracy was substantial in the first generation. Traits that were considered to be influenced by fewer QTL and to have a higher heritability maintained a higher GEBV accuracy over generations. In conclusion, GEBV capture information beyond pedigree relationships, but retraining every generation is recommended for genomic selection in closed breeding populations.</p
Best Linear Unbiased Prediction of Genomic Breeding Values Using a Trait-Specific Marker-Derived Relationship Matrix
With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest
Genotyping of Streptococcus agalactiae (group B streptococci) isolated from vaginal and rectal swabs of women at 35-37 weeks of pregnancy
<p>Abstract</p> <p>Background</p> <p>Group B streptococci (GBS), or <it>Streptococcus agalactiae</it>, are the leading bacterial cause of meningitis and bacterial sepsis in newborns. Here we compared different culture media for GBS detection and we compared the occurrence of different genotypes and serotypes of GBS isolates from the vagina and rectum.</p> <p>Methods</p> <p><it>Streptococcus agalactiae </it>was cultured separately from both rectum and vagina, for a total of 150 pregnant women, i) directly onto Columbia CNA agar, or indirectly onto ii) Granada agar resp. iii) Columbia CNA agar, after overnight incubation in Lim broth.</p> <p>Results</p> <p>Thirty six women (24%) were colonized by GBS. Of these, 19 harbored GBS in both rectum and vagina, 9 only in the vagina and 8 exclusively in the rectum. The combination of Lim broth and subculture on Granada agar was the only culture method that detected all GBS positive women. Using RAPD-analysis, a total of 66 genotypes could be established among the 118 isolates from 32 women for which fingerprinting was carried out. Up to 4 different genotypes in total (rectal + vaginal) were found for 4 women, one woman carried 3 different genotypes vaginally and 14 women carried two 2 different genotypes vaginally. Only two subjects were found to carry strains with the same genotype, although the serotype of both of these strains was different.</p> <p>Eighteen of the 19 subjects with GBS at both sites had at least one vaginal and one rectal isolate with the same genotype.</p> <p>We report the presence of two to four different genotypes in 22 (61%) of the 36 GBS positive women and the presence of identical genotypes in both sites for all women but one.</p> <p>Conclusion</p> <p>The combination of Lim broth and subculture on Granada medium provide high sensitivity for GBS detection from vaginal and rectal swabs from pregnant women. We established a higher genotypic diversity per individual than other studies, with up to four different genotypes among a maximum of 6 isolates per individual picked. Still, 18 of the 19 women with GBS from both rectum and vagina had at least one isolate from each sampling site with the same genotype.</p
- …