413 research outputs found

    Using Data Lake Stack in Animal Sciences

    Get PDF
    Big Data is a theme that receives a lot of attention, and is often characterised as managing and analysing large datasets to reveal new valuable patterns. In the livestock domain, big data is also becoming more common and is being anchored into the mind-set of researchers, due to, for example, sensors generating ..

    Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle

    Get PDF
    <p>Background: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Methods: Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. Results: The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Conclusions: Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.</p

    Traditional mixed linear modelling versus modern machine learning to estimate cow individual feed intake

    Get PDF
    Three modelling approaches were used to estimate cow individual feed intake(FI) using feeding trial data from a research farm, including weekly recordingsof milk production and composition, live-weight, parity, and total FI.Additionally, weather data (temperature, humidity) were retrieved from theDutch National Weather Service (KNMI). The 2014 data (245 cows; 277parities) were used for model development. The first model (M1) applied anexisting formula to estimate energy requirement using parity, fat and proteincorrected milk, and live-weight, and assumed this requirement to be equal toenergy intake and thus FI. The second model used ‘traditional’ Mixed LinearRegression, first using the same variables as in M1 as fixed effects (MLR1), andthen by adding weather data (MLR2). The third model applied BoostedRegression Tree, a ‘modern’ machine learning technique, again once with thesame variables as M1 (BRT1), and once with weather information added(BRT2). All models were validated on 2015 data (155 cows; 165 parities) usingcorrelation between estimated and actual FI to evaluate performance. BothMLRs had very high correlations (0.91) between actual and estimated FI on 2014data, much higher than 0.46 for M1, and 0.73 for both BRTs. When validated on2015 data, correlations dropped to 0.71 for MLR1 and 0.72 for MLR2, andincreased to 0.71 for M1 and 0.76 for both BRTs. FI estimated by BRT1 was, onaverage, 0.35kg less (range: -7.61 – 13.32kg) than actual FI compared to 0.52kgless (range: -11.67 – 19.87kg) for M1. Adding weather data did not improve FIestimations

    Towards field specific phosphate applications norms with machine learning

    Get PDF
    Efficient use of animal manure is an important link in the nutrient cycle in agricultural systems. On Dutch dairy farms, most manure is applied on grass and cropland, with maize as main crop. With the aim of balancing P input and output at field level, which is the idea behind the currently used, but rather fixed, ..

    Functional and population genetic features of copy number variations in two dairy cattle populations

    Get PDF
    Background: Copy Number Variations (CNVs) are gain or loss of DNA segments that are known to play a role in shaping a wide range of phenotypes. In this study, we used two dairy cattle populations, Holstein Friesian and Jersey, to discover CNVs using the Illumina BovineHD Genotyping BeadChip aligned to the ARS-UCD1.2 assembly. The discovered CNVs were investigated for their functional impact and their population genetics features. Results: We discovered 14,272 autosomal CNVs, which were aggregated into 1755 CNV regions (CNVR) from 451 animals. These CNVRs together cover 2.8% of the bovine autosomes. The assessment of the functional impact of CNVRs showed that rare CNVRs (MAF 2 = ~ 0.1 at 10 kb distance) than the rest. Nevertheless, this LD is still lower than SNP-SNP LD (r 2 = ~ 0.5 at 10 kb distance). Conclusions: Our analyses showed that CNVRs detected using BovineHD BeadChip arrays are likely to be functional. This finding indicates that CNVs can potentially disrupt the function of genes and thus might alter phenotypes. Also, the population differentiation index revealed two candidate genes, MGAM and ADAMTS17, which hint at adaptive evolution between the two populations. Lastly, low CNVR-SNP LD implies that genetic variation from CNVs might not be fully captured in routine animal genetic evaluation, which relies solely on SNP markers.</p

    Genomic regions associated with muscularity in beef cattle differ in five contrasting cattle breeds

    Get PDF
    peer-reviewedBackground Linear type traits, which reflect the muscular characteristics of an animal, could provide insight into how, in some cases, morphologically very different animals can yield the same carcass weight. Such variability may contribute to differences in the overall value of the carcass since primal cuts vary greatly in price; such variability may also hinder successful genome-based association studies. Therefore, the objective of our study was to identify genomic regions that are associated with five muscularity linear type traits and to determine if these significant regions are common across five different breeds. Analyses were carried out using linear mixed models on imputed whole-genome sequence data in each of the five breeds, separately. Then, the results of the within-breed analyses were used to conduct an across-breed meta-analysis per trait. Results We identified many quantitative trait loci (QTL) that are located across the whole genome and associated with each trait in each breed. The only commonality among the breeds and traits was a large-effect pleiotropic QTL on BTA2 that contained the MSTN gene, which was associated with all traits in the Charolais and Limousin breeds. Other plausible candidate genes were identified for muscularity traits including PDE1A, PPP1R1C and multiple collagen and HOXD genes. In addition, associated (gene ontology) GO terms and KEGG pathways tended to differ between breeds and between traits especially in the numerically smaller populations of Angus, Hereford, and Simmental breeds. Most of the SNPs that were associated with any of the traits were intergenic or intronic SNPs located within regulatory regions of the genome. Conclusions The commonality between the Charolais and Limousin breeds indicates that the genetic architecture of the muscularity traits may be similar in these breeds due to their similar origins. Conversely, there were vast differences in the QTL associated with muscularity in Angus, Hereford, and Simmental. Knowledge of these differences in genetic architecture between breeds is useful to develop accurate genomic prediction equations that can operate effectively across breeds. Overall, the associated QTL differed according to trait, which suggests that breeding for a morphologically different (e.g. longer and wider versus shorter and smaller) more efficient animal may become possible in the future

    Genomic Regions Associated With Skeletal Type Traits in Beef and Dairy Cattle Are Common to Regions Associated With Carcass Traits, Feed Intake and Calving Difficulty

    Get PDF
    Linear type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlated with a range of other performance traits in cattle including feed intake, reproduction traits and carcass merit; thus, type traits could also provide useful insights into the morphological differences among animals underpinning phenotypic differences in these complex traits. The objective of the present study was to identify genomic regions associated with five subjectively scored skeletal linear traits, to determine if these associated regions are common in multiple beef and dairy breeds, and also to determine if these regions overlap with those proposed elsewhere to be associated with correlated performance traits. Analyses were carried out using linear mixed models on imputed whole genome sequence data separately in 1,444 Angus, 1,129 Hereford, 6,433 Charolais, 8,745 Limousin, 1,698 Simmental, and 4,494 Holstein-Friesian cattle, all scored for the linear type traits. There was, on average, 18 months difference in age at assessment of the beef versus the dairy animals. While the majority of the identified quantitative trait loci (QTL), and thus genes, were both trait-specific and breed-specific, a large-effect pleiotropic QTL on BTA6 containing the NCAPG and LCORL genes was associated with all skeletal traits in the Limousin population and with wither height in the Angus. Other than that, little overlap existed in detected QTLs for the skeletal type traits in the other breeds. Only two QTLs overlapped the beef and dairy breeds; both QTLs were located on BTA5 and were associated with height in both the Angus and the Holstein-Friesian, despite the difference in age at assessment. Several detected QTLs in the present study overlapped with QTLs documented elsewhere that are associated with carcass traits, feed intake, and calving difficulty. While most breeding programs select for the macro-traits like carcass weight, carcass conformation, and feed intake, the higher degree of granularity with selection on the individual linear type traits in a multi-trait index underpinning the macro-level goal traits, presents an opportunity to help resolve genetic antagonisms among morphological traits in the pursuit of the animal with optimum performance metrics.</p

    Improving predictive performance on survival in dairy cattle using an ensemble learning approach

    Get PDF
    Cow survival is a complex trait that combines traits like milk production, fertility, health and environmental factors such as farm management. This complexity makes survival difficult to predict accurately. This is probably the reason why few studies attempted to address this problem and no studies are published that use ensemble methods for this purpose. We explored if we could improve prediction of cow survival to second lactation, when predicted at five different moments in a cow's life, by combining the predictions of multiple (weak) methods in an ensemble method. We tested four ensemble methods: majority voting rule, multiple logistic regression, random forest and naive Bayes. Precision, recall, balanced accuracy, area under the curve (AUC) and gains in proportion of surviving cows in a scenario where the best 50% were selected were used to evaluate the ensemble model performance. We also calculated correlations between the ensemble models and obtained McNemar's test statistics. We compared the performance of the ensemble methods against those of the individual methods. We also tested if there was a difference in performance metrics when continuous (from 0 to 1) and binary (0 or 1) prediction outcomes were used. In general, using continuous prediction output resulted in higher performance metrics than binary ones. AUCs for models ranged from 0.561 to 0.731, with generally increasing performance at moments later in life. Precision, AUC and balanced accuracy values improved significantly for the naive Bayes and multiple logistic regression ensembles in at least one data set, although performance metrics did remain low overall. The multiple logistic regression ensemble method resulted in equal or better precision, AUC, balanced accuracy and proportion of animals surviving on all datasets and was significantly different from the other ensembles in three out of five moments. The random forest ensemble method resulted in the least significant improvement over the individual methods

    Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle

    Get PDF
    Background Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives. Methods Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population. Results The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively. Conclusions At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring

    Veterinary dairy herd fertility service provision in seasonal and non-seasonal dairy industries - a comparison

    Get PDF
    The decline in dairy herd fertility internationally has highlighted the limited impact of traditional veterinary approaches to bovine fertility management. Three questionnaire surveys were conducted at buiatrics conferences attended by veterinary practitioners on veterinary dairy herd fertility services (HFS) in countries with a seasonal (Ireland, 47 respondents) and non-seasonal breeding model (The Netherlands, 44 respondents and Portugal, 31 respondents). Of the 122 respondents, 73 (60%) provided a HFS and 49 (40%) did not. The majority (76%) of all practitioners who responded stated that bovine fertility had declined in their practice clients' herds with inadequate cow management, inadequate nutrition and increased milk yield as the most important putative causes. The type of clients who adopted a herd fertility service were deemed more educated than average (70% of respondents), and/or had fertility problems (58%) and/or large herds (53%). The main components of this service were routine postpartum examinations (95% of respondents), fertility records analysis (75%) and ultrasound pregnancy examinations (69%). The number of planned visits per annum varied between an average of four in Ireland, where breeding is seasonal, and 23 in Portugal, where breeding is year-round. The benefits to both the practitioner and their clients from running a HFS were cited as better fertility, financial rewards and job satisfaction. For practitioners who did not run a HFS the main reasons given were no client demand (55%) and lack of fertility records (33%). Better economic evidence to convince clients of the cost-benefit of such a service was seen as a major constraint to adoption of this service by 67% of practitioners
    • …
    corecore