56 research outputs found

    Multiple factor analysis highlights the contributions of SNPs, CpGs and sncRNAs to bull fertility.

    No full text
    MFA was run on the 12,006 selected features belonging to the CpG, sncRNA, SNP and SP tables that actively contributed to the results. Furthermore, fertility, the origins of bulls and the semen extraction batch were set as illustrative features, meaning that they did not participate in MFA construction. a: A global variable plot with active features shown in red and illustrative features in green. b: Individual factor map where each dot corresponds to a bull and coloured depending on its fertility class. C, D: Variable factor maps for quantitative features (CpGs and sncRNAs). The first and second dimensions [c] and the first and third dimensions [d] are represented. Each arrowhead corresponds to a feature and was coloured depending on its dataset of origin, with CpGs, sncRNAs and SPs shown in blue, yellow and grey, respectively. Furthermore, the intensity of the colour of arrowheads indicated the cos2, reflecting the strength of the correlation between a feature and dimension 1. In C, two clusters are represented, gathering the features with the most important positive (>0.55, cluster 1) or negative (<0.4, cluster 2) coordinates along dimension 1.</p

    This file contains: S1–S9 Figs.

    No full text
    Bull fertility is an important economic trait, and the use of subfertile semen for artificial insemination decreases the global efficiency of the breeding sector. Although the analysis of semen functional parameters can help to identify infertile bulls, no tools are currently available to enable precise predictions and prevent the commercialization of subfertile semen. Because male fertility is a multifactorial phenotype that is dependent on genetic, epigenetic, physiological and environmental factors, we hypothesized that an integrative analysis might help to refine our knowledge and understanding of bull fertility. We combined -omics data (genotypes, sperm DNA methylation at CpGs and sperm small non-coding RNAs) and semen parameters measured on a large cohort of 98 Montbéliarde bulls with contrasting fertility levels. Multiple Factor Analysis was conducted to study the links between the datasets and fertility. Four methodologies were then considered to identify the features linked to bull fertility variation: Logistic Lasso, Random Forest, Gradient Boosting and Neural Networks. Finally, the features selected by these methods were annotated in terms of genes, to conduct functional enrichment analyses. The less relevant features in -omics data were filtered out, and MFA was run on the remaining 12,006 features, including the 11 semen parameters and a balanced proportion of each type of–omics data. The results showed that unlike the semen parameters studied the–omics datasets were related to fertility. Biomarkers related to bull fertility were selected using the four methodologies mentioned above. The most contributory CpGs, SNPs and miRNAs targeted genes were all found to be involved in development. Interestingly, fragments derived from ribosomal RNAs were overrepresented among the selected features, suggesting roles in male fertility. These markers could be used in the future to identify subfertile bulls in order to increase the global efficiency of the breeding sector.</div

    Data filtering strategy.

    No full text
    The four different tables included a heterogeneous number of features [a]. Because the CpGs, SNPs and sncRNAs constituted huge data tables, features that could be considered as noise and features that did not display significant variations among the bulls were filtered out [b]. Because the remaining sncRNAs and SPs were impacted by the extraction batch of the semen, they were next corrected for this batch effect [c]. Finally, because the CpG, SNP and sncRNA tables still included an important number of features, the most relevant were selected using a supervised method, Random Forest [d]. At the end of these three filtering steps, 12,006 relevant features originating from four data tables were retained for further analysis.</p

    The features selected are specific to each method.

    No full text
    a: Venn diagram showing the intersection between methods in terms of the features selected. Areas are coloured according to the proportion of features they include, compared to the total quantity of features selected by each method. b: The datasets of origin of the selected features are represented by pie charts. Methods displaying similar behaviours are grouped together.</p

    Functional analyses of selected features.

    No full text
    a: Global strategy for functional analysis. The combination of SNP, CpG and sncRNA features selected by the three unbiased methods (Cforest, Gradient Boosting, Neural Networks) was considered and referred to as “Selected features”. Genes including the selected CpGs and SNPs were subjected directly to enrichment analysis. The distribution of different sncRNAs families highlighted an overrepresentation of miRNAs and rRFs among the selected features when compared to the background, which included the 413,952 sncRNAs that were initially represented in the sncRNA dataset (lower left panel). The analysis therefore focused on the miRNA target genes that were subjected to functional enrichment analysis. b: The genes containing selected SNP and CpG features underwent enrichment analysis using DAVID. Three clusters of terms were significantly enriched (EASE score higher than 1.3; left-hand panel). The proportions of genes targeted by selected CpGs only, selected SNPs only, or by both CpGs and SNPs, varied in the three clusters (pie charts, right-hand panel). c: Genes identified as putative targets of selected miRNAs by Targetscan underwent an overrepresentation analysis using Webgestalt. The top 10 overrepresented GO terms are listed, with the corresponding adjusted p-values.</p

    Correlation structure among CpG and sncRNA features.

    No full text
    The correlation matrix for features belonging to the two clusters defined in Fig 2C was computed. Features are displayed in lines and columns and coloured according to the datasets and clusters. The intensity of colours in the heatmap reflects the strength of the correlation between two features, with positive and negative correlations indicated in red and blue, respectively.</p
    • …
    corecore