55 research outputs found

    Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence.

    Get PDF
    Recent developments allowed generating multiple high-quality \u27omics\u27 data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values

    QTLMAS 2009: simulated dataset

    Get PDF
    Background - The simulation of the data for the QTLMAS 2009 Workshop is described. Objective was to simulate observations from a growth curve which was influenced by a number of QTL. Results - The data consisted of markers, phenotypes and pedigree. Genotypes of 453 markers, distributed over 5 chromosomes of 1 Morgan each, were simulated for 2,025 individuals. From those, 25 individuals were parents of the other 2,000 individuals. The 25 parents were genetically related. Phenotypes were simulated according to a logistic growth curve and were made available for 1,000 of the 2,000 offspring individuals. The logistic growth curve was specified by three parameters. Each parameter was influenced by six Quantitative Trait Loci (QTL), positioned at the five chromosomes. For each parameter, one QTL had a large effect and five QTL had small effects. Variance of large QTL was five times the variance of small QTL. Simulated data was made available at http://www.qtlmas2009.wur.nl/UK/Dataset

    Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

    Get PDF
    We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects

    Comparison of analyses of the QTLMAS XIII common dataset. I: genomic selection

    Get PDF
    Background - Genomic selection, the use of markers across the whole genome, receives increasing amounts of attention and is having more and more impact on breeding programs. Development of statistical and computational methods to estimate breeding values based on markers is a very active area of research. A simulated dataset was analyzed by participants of the QTLMAS XIII workshop, allowing a comparison of the ability of different methods to estimate genomic breeding values. Methods - A best case scenario was analyzed by the organizers where QTL genotypes were known. Participants submitted estimated breeding values for 1000 unphenotyped individuals together with a description of the applied method(s). The submitted breeding values were evaluated for correlation with the simulated values (accuracy), rank correlation of the best 10% of individuals and error in predictions. Bias was tested by regression of simulated on estimated breeding values. Results - The accuracy obtained from the best case scenario was 0.94. Six research groups submitted 19 sets of estimated breeding values. Methods that assumed the same variance for markers showed accuracies, measured as correlations between estimated and simulated values, ranging from 0.75 to 0.89 and rank correlations between 0.58 and 0.70. Methods that allowed different marker variances showed accuracies ranging from 0.86 to 0.94 and rank correlations between 0.69 and 0.82. Methods assuming equal marker variances were generally more biased and showed larger prediction errors. Conclusions - The best performing methods achieved very high accuracies, close to accuracies achieved in a best case scenario where QTL genotypes were known without error. Methods that allowed different marker variances generally outperformed methods that assumed equal marker variances. Genomic selection methods performed well compared to traditional, pedigree only, methods; all methods showed higher accuracies than those obtained for breeding values estimated solely on pedigree relationship

    Predicting Flowering Behavior and Exploring Its Genetic Determinism in an Apple Multi-family Population Based on Statistical Indices and Simplified Phenotyping

    Get PDF
    Irregular flowering over years is commonly observed in fruit trees. The early prediction of tree behavior is highly desirable in breeding programmes. This study aims at performing such predictions, combining simplified phenotyping and statistics methods. Sequences of vegetative vs. floral annual shoots (AS) were observed along axes in trees belonging to five apple related full-sib families. Sequences were analyzed using Markovian and linear mixed models including year and site effects. Indices of flowering irregularity, periodicity and synchronicity were estimated, at tree and axis scales. They were used to predict tree behavior and detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of a Biennial Bearing Index (BBI) with an autoregressive coefficient (γg) efficiently predicted and classified the genotype behaviors, despite few misclassifications. Four QTLs common to BBIs and γg and one for synchronicity were highlighted and revealed the complex genetic architecture of the traits. Irregularity resulted from high AS synchronism, whereas regularity resulted from either asynchronous locally alternating or continual regular AS flowering. A relevant and time-saving method, based on a posteriori sampling of axes and statistical indices is proposed, which is efficient to evaluate the tree breeding values for flowering regularity and could be transferred to other species

    Mutations on a conserved distal enhancer in the porcine C-reactive protein gene impair its expression in liver

    Get PDF
    C-reactive protein (CRP) is an evolutionary highly conserved protein. Like humans, CRP acts as a major acute phase protein in pigs. While CRP regulatory mechanisms have been extensively studied in humans, little is known about the molecular mechanisms that control pig CRP gene expression. The main goal of the present work was to study the regulatory mechanisms and identify functional genetic variants regulating CRP gene expression and CRP blood levels in pigs. The characterization of the porcine CRP proximal promoter region revealed a high level of conservation with both cow and human promoters, sharing binding sites for transcription factors required for CRP expression. Through genome-wide association studies and fine mapping, the most associated variants with both mRNA and protein CRP levels were localized in a genomic region 39.3 kb upstream of CRP. Further study of the region revealed a highly conserved putative enhancer that contains binding sites for several transcriptional regulators such as STAT3, NF-kB or C/EBP-β. Luciferase reporter assays showed the necessity of this enhancer-promoter interaction for the acute phase induction of CRP expression in liver, where differences in the enhancer sequences significantly modified CRP activity. The associated polymorphisms disrupted the putative binding sites for HNF4α and FOXA2 transcription factors. The high correlation between HNF4α and CRP expression levels suggest the participation of HNF4α in the regulatory mechanism of porcine CRP expression through the modification of its binding site in liver. Our findings determine, for the first time, the relevance of a distal regulatory element essential for the acute phase induction of porcine CRP in liver and identify functional polymorphisms that can be included in pig breeding programs to improve immunocompetence.The authors declare financial support was received for the research, authorship, and/or publication of this article. The study was funded by grants AGL2016-75432-R and PID2020-112677RB-C21 awarded by MCIN/AEI/10.13039/501100011033 and GENE-SWitCH project (https://www.gene-switch.eu), which is funded by the European Union’s Horizon 2020 Research and Innovation Programme under the grant agreement n°817998. T. Jové-Juncà was funded with an IRTA fellowship (CPI1221) and C. Hernández-Banqué was supported by a FPI grant (PRE2021-097825) granted by the Spanish Ministry of Science and Innovation. YR-C was financially supported by a Ramon y Cajal contract (RYC2019-027244-I) from the Spanish Ministry of Science and Innovation. The authors are part to a Consolidated Research Group AGAUR, with the reference 2021-SGR-01552.info:eu-repo/semantics/publishedVersio

    Identification of transcriptional regulatory variants in pig duodenum, liver, and muscle tissues

    Get PDF
    Background In humans and livestock species, genome-wide association studies (GWAS) have been applied to study the association between variants distributed across the genome and a phenotype of interest. To discover genetic polymorphisms affecting the duodenum, liver, and muscle transcriptomes of 300 pigs from 3 different breeds (Duroc, Landrace, and Large White), we performed expression GWAS between 25,315,878 polymorphisms and the expression of 13,891 genes in duodenum, 12,748 genes in liver, and 11,617 genes in muscle. Results More than 9.68 × 1011 association tests were performed, yielding 14,096,080 significantly associated variants, which were grouped in 26,414 expression quantitative trait locus (eQTL) regions. Over 56% of the variants were within 1 Mb of their associated gene. In addition to the 100-kb region upstream of the transcription start site, we identified the importance of the 100-kb region downstream of the 3′UTR for gene regulation, as most of the cis-regulatory variants were located within these 2 regions. We also observed 39,874 hotspot regulatory polymorphisms associated with the expression of 10 or more genes that could modify the protein structure or the expression of a regulator gene. In addition, 2 motifs (5′-GATCCNGYGTTGCYG-3′ and a poly(A) sequence) were enriched across the 3 tissues within the neighboring sequences of the most significant single-nucleotide polymorphisms in each cis-eQTL region. Conclusions The 14 million significant associations obtained in this study are publicly available and have enabled the identification of expression-associated cis-, trans-, and hotspot regulatory variants within and across tissues, thus shedding light on the molecular mechanisms of regulatory variations that shape end-trait phenotypes.info:eu-repo/semantics/publishedVersio

    QTL linkage analysis of connected populations using ancestral marker and pedigree information

    Get PDF
    The common assumption in quantitative trait locus (QTL) linkage mapping studies that parents of multiple connected populations are unrelated is unrealistic for many plant breeding programs. We remove this assumption and propose a Bayesian approach that clusters the alleles of the parents of the current mapping populations from locus-specific identity by descent (IBD) matrices that capture ancestral marker and pedigree information. Moreover, we demonstrate how the parental IBD data can be incorporated into a QTL linkage analysis framework by using two approaches: a Threshold IBD model (TIBD) and a Latent Ancestral Allele Model (LAAM). The TIBD and LAAM models are empirically tested via numerical simulation based on the structure of a commercial maize breeding program. The simulations included a pilot dataset with closely linked QTL on a single linkage group and 100 replicated datasets with five linkage groups harboring four unlinked QTL. The simulation results show that including parental IBD data (similarly for TIBD and LAAM) significantly improves the power and particularly accuracy of QTL mapping, e.g., position, effect size and individuals’ genotype probability without significantly increasing computational demand

    Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data

    Get PDF
    Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature

    Demography and mating system shape the genome-wide impact of purifying selection in Arabis alpina

    Get PDF
    YesPlant mating systems have profound effects on levels and structuring of genetic variation and can affect the impact of natural selection. Although theory predicts that intermediate outcrossing rates may allow plants to prevent accumulation of deleterious alleles, few studies have empirically tested this prediction using genomic data. Here, we study the effect of mating system on purifying selection by conducting population-genomic analyses on whole-genome resequencing data from 38 European individuals of the arctic-alpine crucifer Arabis alpina. We find that outcrossing and mixed-mating populations maintain genetic diversity at similar levels, whereas highly self-fertilizing Scandinavian A. alpina show a strong reduction in genetic diversity, most likely as a result of a postglacial colonization bottleneck. We further find evidence for accumulation of genetic load in highly self-fertilizing populations, whereas the genome-wide impact of purifying selection does not differ greatly between mixed-mating and outcrossing populations. Our results demonstrate that intermediate levels of outcrossing may allow efficient selection against harmful alleles, whereas demographic effects can be important for relaxed purifying selection in highly selfing populations. Thus, mating system and demography shape the impact of purifying selection on genomic variation in A. alpina. These results are important for an improved understanding of the evolutionary consequences of mating system variation and the maintenance of mixed-mating strategies.This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1707492115/-/DCSupplemental
    corecore