Prospects of population growth along with the increased demand for protein-based diets require plant variety development programs, especially for commodity crops, to deliver cultivars that have increased yields and are able to tolerate environmental stresses such as heat, drought, disease, and insect pests. The phenotypic expression (P) of quantitative traits is influenced not only by genetic factors (G) but also by climate and agronomic practices (E) which interact with G. For over a century, the expression of P has been modeled as G + E + GE and various experimental and analytical methods have attempted to disentangle confounded G and non-G components of P. Herein, we investigated various linear mixed models to provide unbiased estimates of the G component for yield evaluated in annual multi-environment trials (MET). Data from MET are used to make decisions about which replicable genotypes have the potential to be released as varieties. Phenotypic data from MET conducted by publicly supported plant breeders of commodity crops such as soybean are available for investigations of analytic methods and models. We hypothesized it is possible to estimate Realized Genetic Gain (RGG) using published data from routine annual MET.
We approached the research question using a combination of analyses of empirical data and simulated data and present the results in two independent manuscripts, presented as Chapters 2 and 3. In Chapter 2, we explored an empirical dataset of advanced MET obtained from public soybean varietal development programs responsible for genetic improvement in maturity zones II and III of the United States. This dataset is composed of 39,006 phenotypic records from 4,257 experimental lines, 63 locations, and 31 years (1989-2019), and is available in the R package SoyURT. The results from this chapter revealed that for seed yield (i) the variation due to genotype by location was more important than the variation due to genotype by year, (ii) the observed 63 locations can be grouped into mega-environments using phenotypic, geographic, and meteorological data, and (iii) information about the estimated variances of GE interaction (GEI) component of the P variance can be represented as probability distributions. The value of such information is to provide sampling distributions for simulation studies, such as we conducted in Chapter 3.
In Chapter 3, we evaluated several linear mixed models to estimate RGG for seed yield from simulated MET. Simulation models were designed based on information from Chapter 2. For example, in the simulated GEI models, correlated quantitative trait loci effects were simulated for genotype by year and genotype by location interaction effects. We further extended the simulator to incorporate a positive rate of non-genetic gain to represent advances in agronomic management practices. The analytic models used to estimate RGG in the simulated data were then compared in terms of bias and linearity. Bias was quantified according to a definition of RGG that is applicable to variety development programs. We proposed RGG be defined as the accumulation of beneficial alleles in breeding lines (i.e., experimental lines used in crossing blocks) across years of breeding operation. This definition is consistent with the original concept of genetic gain.
Simulation results indicated all analytic models used to estimate RGG provided biased results. Covariance modeling as well as direct versus indirect estimation resulted in substantial differences in RGG estimation. Although there were no unbiased models, the three models with the least bias and smallest values of root mean squared error resulted in an average bias of Β±7.41 kg/haβ1/yrβ1 (Β±0.11 bu/acβ1/yrβ1). Rather than relying on a single model to estimate RGG from multiple years of field trials, we recommend the application of multiple models and utilizing the range of the estimated values for decision-making. Further, based on our simulations (number of environments, experimental genotypes, etc.), we do not think it is appropriate to use any single one of these models to compare breeding programs or quantify the efficiency of proposed new breeding strategies