17 research outputs found

    Statistical tools for genome-wide studies

    Get PDF
    The aim of genomic selection (GS) in livestock is to detect linkage disequilibrium between SNP and quantitative trait loci (QTL) across the whole genome, to improve the accuracy of the estimated breeding value (GEBV) in genetic improvement programs. Two main issues affect GS: the imbalance between the number of SNP and the number of involved animals and the high genotyping costs. In this thesis the principal component analysis (PCA) is proposed as a method to reduce the dimensionality of the SNP data. In particular, the study evaluated the effect of the rank of the variance-covariance matrix on the accuracy of GEBV when PCA was applied. In addition, a new approach is proposed to reduce the dimensionality of the data. First, this new method was used in a genomic wide association study to detect associations among markers and traits under study. Then the obtained results were used to reduce the number of SNPs useful to estimate the GEBV. Results show that, the accuracy of GEBV, when only the SNPs selected with the new method were used, was on average nearly equal to or sometimes greater than the accuracies obtained when all SNPs were used. This thesis also proposes the partial least squared regression (PLSR) to impute markers not present in economic chips and avoid a reduction in the accuracy of GEBV estimation. The study demonstrated that the PLSR imputation method can efficiently impute missing genotypes from low-density panels to HDP.</br

    Prediction of Milk Coagulation Properties and Individual Cheese Yield in Sheep Using Partial Least Squares Regression

    Get PDF
    The objectives of this study were (i) the prediction of sheep milk coagulation properties (MCP) and individual laboratory cheese yield (ILCY) from mid-infrared (MIR) spectra by using partial least squares (PLS) regression, and (ii) the comparison of different data pre-treatments on prediction accuracy. Individual milk samples of 970 Sarda breed ewes were analyzed for rennet coagulation time (RCT), curd-firming time (k20), and curd firmness (a30) using the Formagraph instrument; ILCY was measured by micro-manufacturing assays. An Furier-transform Infrared (FTIR) milk-analyzer was used for the estimation of the milk gross composition and the recording of MIR spectrum. The dataset (n = 859, after the exclusion of 111 noncoagulating samples) was divided into two sub-datasets: the data of 700 ewes were used to estimate prediction model parameters, and the data of 159 ewes were used to validate the model. Four prediction scenarios were compared in the validation, differing for the use of whole or reduced MIR spectrum and the use of raw or corrected data (locally weighted scatterplot smoothing). PLS prediction statistics were moderate. The use of the reduced MIR spectrum yielded the best results for the considered traits, whereas the data correction improved the prediction ability only when the whole MIR spectrum was used. In conclusion, PLS achieves good accuracy of prediction, in particular for ILCY and RCT, and it may enable increasing the number of traits to be included in breeding programs for dairy sheep without additional costs and logistics

    Genetic parameters of milk fatty acid profile in sheep: comparison between gas chromatographic measurements and Fourier-transform IR spectroscopy predictions.

    Get PDF
    Fatty acid (FA) composition is a key component of sheep milk nutritional quality. However, breeding for FA composition in dairy sheep is hampered by the logistic and phenotyping costs. This study was aimed at estimating genetic parameters for sheep milk FA and to test the feasibility of their routine measurement by using Fourier-transform IR (FTIR) spectroscopy. Milk FA composition of 989 Sarda ewes farmed in 48 flocks was measured by gas chromatography (FA GC ). Moreover, FTIR spectrum was collected for each sample, and it was used to predict FA composition (FA FTIR ) by partial least squares regression: 700 ewes were used for estimating model parameters, whereas the remaining 289 animals were used to validate the model. One hundred replicates were performed by randomly assigning animals to estimation and validation data set, respectively. Variance components for both measured and predicted traits were estimated with an animal model that included the fixed effects of parity, days in milking interval, lambing month, province, altitude of flock location, the random effects of flock-test-date and animal genetic additive. Genetic correlations among FA GC , and between corresponding FA GC and FA FTIR were estimated by bivariate analysis. Coefficients of determination between FA GC and FA FTIR ranged from moderate (about 0.50 for odd- and branched-chain FA) to high (about 0.90 for de novo FA) values. Low-to-moderate heritabilities were observed for individual FA (ranging from 0.01 to 0.47). The largest value was observed for GC measured C16:0. Low–to-moderate heritabilities were estimated for FA groups. In most of cases, heritabilites were slightly larger for FA GC than FA FTIR . Estimates of genetic correlations among FA GC showed a large variability in magnitude and sign. The genetic correlation between FA FTIR and FA GC was higher than 60% for all investigated traits. Results of the present study confirm the existence of genetic variability of the FA composition in sheep and suggest the feasibility of using FA FTIR as proxies for these traits in large scale breeding programs

    Use of partial least squares regression to impute SNP genotypes in Italian Cattle breeds

    Get PDF
    Background The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Methods Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. Results In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Conclusions Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available

    The Impact of the rank of marker variance–covariance matrix in principal component evaluation for genomic selection applications

    No full text
    In genomic selection (GS) programmes, direct genomic values (DGV) are evaluated using information provided by high-density SNP chip. Being DGV accuracy strictly dependent on SNP density, it is likely that an increase in the number of markers per chip will result in severe computational consequences. Aim of present work was to test the effectiveness of principal component analysis (PCA) carried out by chromosome in reducing the marker dimensionality for GS purposes. A simulated data set of 5700 individuals with an equal number of SNP distributed over six chromosomes was used. PCs were extracted both genome-wide (ALL) and separately by chromosome (CHR) and used to predict DGVs. In the ALL scenario, the SNP variance–covariance matrix (S) was singular, positive semi-definite and contained null information which introduces ‘spuriousness’ in the derived results. On the contrary, the S matrix for each chromosome (CHR scenario) had a full rank. Obtained DGV accuracies were always better for CHR than ALL. Moreover, in the latter scenario, DGV accuracies became soon unsettled as the number of animals decreases, whereas in CHR, they remain stable till 900–1000 individuals. In real applications where a 54k SNP chip is used, the largest number of markers per chromosome is approximately 2500. Thus, a number of around 3000 genotyped animals could lead to reliable results when the original SNP variables are replaced by a reduced number of PCs

    Dissection of genomic correlation matrices of US Holsteins using multivariate factor analysis

    Get PDF
    The aim of this study was to compare correlation matrices between direct genomic predictions for 31 traits at the genomic and chromosomal levels in US Holstein bulls. Multivariate factor analysis carried out at the genome level identified seven factors associated with conformation, longevity, yield, feet and legs, fat and protein content traits. Some differences were found at the chromosome level; variations in covariance structure on BTA 6, 14, 18 and 20 were interpreted as evidence of segregating QTL for different groups of traits. For example, milk yield and composition tended to join in a single factor on BTA 14, which is known to harbour the DGAT1 locus that affects these traits. Another example was on BTA 18, where a factor strongly correlated with sire calving ease and conformation traits was identified. It is known that in US Holstein, there is a segregating QTL on BTA18 influencing these traits. Moreover, a possible candidate gene for daughter pregnancy rate was suggested for BTA28. The methodology proposed in this study could be used to identify individual chromosomes, which have covariance structures that differ from the overall (whole genome) covariance structure. Such differences can be difficult to detect when a large number of traits are evaluated, and covariances may be affected by QTL that do not have large allele substitution effects
    corecore