11 research outputs found
Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species
Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/
Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species
Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for
GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry.
The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important
effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis.
However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.info:eu-repo/semantics/publishedVersio
Modeling Growth and Yield of Schizolobium amazonicum under Different Spacings
This study aimed to present an approach to model the growth and yield of the species Schizolobium amazonicum (Paricá) based on a study of different spacings located in Pará, Brazil. Whole-stand models were employed, and two modeling strategies (Strategies A and B) were tested. Moreover, the following three scenarios were evaluated to assess the accuracy of the model in estimating total and commercial volumes at five years of age: complete absence of data (S1); available information about the variables basal area, site index, dominant height, and number of trees at two years of age (S2); and this information available at five years of age (S3). The results indicated that the 3 × 2 spacing has a higher mortality rate than normal, and, in general, greater spacing corresponds to larger diameter and average height and smaller basal area and volume per hectare. In estimating the total and commercial volumes for the three scenarios tested, Strategy B seems to be the most appropriate method to estimate the growth and yield of Paricá plantations in the study region, particularly because Strategy A showed a significant bias in its estimates
Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species
Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/
A multi-environment trials diallel analysis provides insights on the inheritance of fumonisin contamination resistance in tropical maize
In maize, the fungi that cause Fusarium ear rot result not only in decreased grain yield and quality, but also grain contamination by fumonisin. This study investigated the inheritance of fumonisin contamination resistance (FCR) in tropical maize, based on a multi-environment trials diallel analysis via mixed models. For this purpose, based on 13 inbred lines, single-cross hybrids were created and assessed in three environments. A mixed model diallel joint analysis across environments was performed, considering the existence of environment-specific variances and correlations between pairs of environments for general combining ability (GCA) and specific combining ability (SCA) effects, and additive genomic relationship between inbred lines for the prediction of GCA and SCA. For all environments, the SCA variance had a higher magnitude than the GCA variance, indicating a predominance of the dominance effects underlying FCR in tropical maize. Moreover, the proportion of the variance among single-cross hybrids that was due to GCA varied from 16 to 22 % across environments, suggesting that SCA is important to predict the hybrids performance. Through modeling variance–covariance structures for GCA and SCA, it was possible to observe that the GCA effects were stable, whereas the SCA effects were specific for each environment. Therefore, these results suggest that the selection of the best parents for the development of new inbred lines can be carried out through the average performance across the evaluated environments. Due to the importance of SCA effects and their complex interaction with environments, the selection of superior hybrids should be performed into specific environments
A multi-environment trials diallel analysis provides insights on the inheritance of fumonisin contamination resistance in tropical maize
In maize, the fungi that cause Fusarium ear rot result not only in decreased grain yield and quality, but also grain contamination by fumonisin. This study investigated the inheritance of fumonisin contamination resistance (FCR) in tropical maize, based on a multi-environment trials diallel analysis via mixed models. For this purpose, based on 13 inbred lines, single-cross hybrids were created and assessed in three environments. A mixed model diallel joint analysis across environments was performed, considering the existence of environment-specific variances and correlations between pairs of environments for general combining ability (GCA) and specific combining ability (SCA) effects, and additive genomic relationship between inbred lines for the prediction of GCA and SCA. For all environments, the SCA variance had a higher magnitude than the GCA variance, indicating a predominance of the dominance effects underlying FCR in tropical maize. Moreover, the proportion of the variance among single-cross hybrids that was due to GCA varied from 16 to 22 % across environments, suggesting that SCA is important to predict the hybrids performance. Through modeling variance–covariance structures for GCA and SCA, it was possible to observe that the GCA effects were stable, whereas the SCA effects were specific for each environment. Therefore, these results suggest that the selection of the best parents for the development of new inbred lines can be carried out through the average performance across the evaluated environments. Due to the importance of SCA effects and their complex interaction with environments, the selection of superior hybrids should be performed into specific environments
Data from: Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials
Breeding for drought tolerance is a challenging task that requires costly, extensive and precise phenotyping. Genomic selection (GS) can be used to maximize selection efficiency and the genetic gains in maize (Zea mays L.) breeding programs for drought tolerance. Here we evaluated the accuracy of genomic selection of additive (A) against additive+dominance (AD) models to predict the performance of untested maize single-cross hybrids for drought tolerance in multi-environment trials. Phenotypic data of five drought-tolerance traits were measured in 308 hybrids in eight trials under water-stressed (WS) and well-watered (WW) conditions over two years and two locations in Brazil. Hybrids’ genotypes were inferred based on their parents’ genotypes (inbred lines) using single nucleotide polymorphism data obtained via genotyping-by-sequencing. GS analyses were performed using genomic best linear unbiased prediction by fitting a factor analytic (FA) multiplicative mixed model. Results showed differences in the predictive accuracy between A and AD models for the five traits under consideration in both water conditions. For grain yield (GY), the AD model doubled the predictive accuracy in comparison to the A model. FA framework allowed for investigating the stability of additive and dominance effects across environments, as well as the additive- and dominance-by-environment interactions, with interesting applications for parental and hybrid selection. Prediction performance of untested hybrids using GS that benefit from borrowing information from correlated trials increased 40% and 9% for A and AD models, respectively. These results highlighted the importance of multi-environment trial analysis with GS that incorporate dominance effects into genomic predictions of GY in maize single-cross hybrids