97 research outputs found

    Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods.

    Get PDF
    Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful

    Bayesian Sparse Factor Analysis of Genetic Covariance Matrices

    Full text link
    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed effects model. The key idea of our model is that we need only consider G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse -- affecting only a few observed traits. The advantages of this approach are two-fold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.Comment: 35 pages, 7 figure

    Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices.

    Get PDF
    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set

    Translational regulation contributes to the elevated CO2 response in two Solanum species.

    Get PDF
    Understanding the impact of elevated CO2 (eCO2 ) in global agriculture is important given climate change projections. Breeding climate-resilient crops depends on genetic variation within naturally varying populations. The effect of genetic variation in response to eCO2 is poorly understood, especially in crop species. We describe the different ways in which Solanum lycopersicum and its wild relative S. pennellii respond to eCO2 , from cell anatomy, to the transcriptome, and metabolome. We further validate the importance of translational regulation as a potential mechanism for plants to adaptively respond to rising levels of atmospheric CO2

    Genomic characterization of the evolutionary potential of the sea urchin Strongylocentrotus droebachiensis facing ocean acidification

    Get PDF
    Ocean acidification (OA) is increasing due to anthropogenic CO2 emissions and poses a threat to marine species and communities worldwide. To better project the effects of acidification on organisms’ health and persistence, an understanding is needed of the 1) mechanisms underlying developmental and physiological tolerance and 2) potential populations have for rapid evolutionary adaptation. This is especially challenging in nonmodel species where targeted assays of metabolism and stress physiology may not be available or economical for large-scale assessments of genetic constraints. We used mRNA sequencing and a quantitative genetics breeding design to study mechanisms underlying genetic variability and tolerance to decreased seawater pH (-0.4 pH units) in larvae of the sea urchin Strongylocentrotus droebachiensis. We used a gene ontology-based approach to integrate expression profiles into indirect measures of cellular and biochemical traits underlying variation in larval performance (i.e., growth rates). Molecular responses to OA were complex, involving changes to several functions such as growth rates, cell division, metabolism, and immune activities. Surprisingly, the magnitude of pH effects on molecular traits tended to be small relative to variation attributable to segregating functional genetic variation in this species. We discuss how the application of transcriptomics and quantitative genetics approaches across diverse species can enrich our understanding of the biological impacts of climate change

    Maintenance of quantitative genetic variance in complex, multitrait phenotypes:the contribution of rare, large effect variants in 2 Drosophila species

    Get PDF
    The interaction of evolutionary processes to determine quantitative genetic variation has implications for contemporary and future phenotypic evolution, as well as for our ability to detect causal genetic variants. While theoretical studies have provided robust predictions to discriminate among competing models, empirical assessment of these has been limited. In particular, theory highlights the importance of pleiotropy in resolving observations of selection and mutation, but empirical investigations have typically been limited to few traits. Here, we applied high-dimensional Bayesian Sparse Factor Genetic modeling to gene expression datasets in 2 species, Drosophila melanogaster and Drosophila serrata, to explore the distributions of genetic variance across high-dimensional phenotypic space. Surprisingly, most of the heritable trait covariation was due to few lines (genotypes) with extreme [>3 interquartile ranges (IQR) from the median] values. Intriguingly, while genotypes extreme for a multivariate factor also tended to have a higher proportion of individual traits that were extreme, we also observed genotypes that were extreme for multivariate factors but not for any individual trait. We observed other consistent differences between heritable multivariate factors with outlier lines vs those factors without extreme values, including differences in gene functions. We use these observations to identify further data required to advance our understanding of the evolutionary dynamics and nature of standing genetic variation for quantitative traits
    • …
    corecore