28 research outputs found

    Extensions of genomic prediction methods and approaches for plant breeding

    Get PDF
    Marker assisted selection (MAS) was a first attempt to exploit molecular marker information for selection purposes in plant breeding. The MAS approach rested on the identification of quantitative trait loci (QTL). Because of inherent shortcomings of this approach, MAS failed as a tool for improving polygenic traits, in most instances. By shifting focus from QTL identification to prediction of genetic values, a novel approach called 'genomic selection', originally suggested for breeding of dairy cattle, presents a solution to the shortcomings of MAS. In genomic selection, a training population of phenotyped and genotyped individuals is used for building the prediction model. This model uses the whole marker information simultaneously, without a preceding QTL identification step. Genetic values of selection candidates, which are only genotyped, are then predicted based on that model. Finally, the candidates are selected according their predicted genetic values. Because of its success, genomic selection completely revolutionized dairy cattle breeding. It is now on the verge of revolutionizing plant breeding, too. However, several features set apart plant breeding programs from dairy cattle breeding. Thus, the methodology has to be extended to cover typical scenarios in plant breeding. Providing such extensions to important aspects of plant breeding are the main objectives of this thesis. Single-cross hybrids are the predominant type of cultivar in maize and many other crops. Prediction of hybrid performance is of tremendous importance for identification of superior hybrids. Using genomic prediction approaches for this purpose is therefore of great interest to breeders. The conventional genomic prediction models estimate a single additive effect per marker. This was not appropriate for prediction of hybrid performance because of two reasons. (1) The parental inbred lines of single-cross hybrids are usually taken from genetically very distant germplasm groups. For example, in hybrid maize breeding in Central Europe, these are the Dent and Flint heterotic groups, separated for more than 500 years. Because of the strong divergence between the heterotic groups, it seemed necessary to estimate heterotic group specific marker effects. (2) Dominance effects are an important component of hybrid performance. They had to be included into the prediction models to capture the genetic variance between hybrids maximally. The use of different heterotic groups in hybrid breeding requires parallel breeding programs for inbred line development in each heterotic group. Increasing the training population size with lines from the opposite heterotic group was not attempted previously. Thus, a further objective of this thesis was to investigate whether an increase in the accuracy of genomic prediction can be achieved by using combined training sets. Important traits in plant breeding are characterized by binomially distributed phenotypes. Examples are germination rate, fertility rates, haploid induction rate and spontaneous chromosome doubling rate. No genomic prediction methods for such traits were available. Therefore, another objective was to provide methodological extensions for such traits. We found that incorporation of dominance effects for genomic prediction of maize hybrid performance led to considerable gains in prediction accuracy when the variance attributable to dominance effects was substantial compared to additive genetic variance. Estimation of marker effects specific to the Dent and Flint heterotic group was of less importance, at least not under the high marker densities available today. The main reason for this was the surprisingly high linkage phase consistency between Dent and Flint heterotic groups. Furthermore, combining individuals from different heterotic groups (Flint and Dent) into a single training population can result in considerable increases in prediction accuracy. Our extensions of the prediction methods to binomially distributed data yielded considerably higher prediction accuracies than approximate Gaussian methods. In conclusion, the developed extensions of prediction methods (to hybrid prediction and binomially distributed data) and approaches (training populations combining heterotic groups) can lead to considerable, cost free gains in prediction accuracy. They are therefore valuable tools for exploiting the full potential of genomic selection in plant breeding.Die markergestütze Selektion (MGS) war ein Versuch molekulare Markern für Selektionszwecke in der Pflanzenzüchtung nutzbar zu machen. Der MGS Ansatz basierte auf der Identifikation von "quantitative trait loci'' (QTL, zu deutsch: Loci mit Effekt auf ein quantitatives Merkmal). Auf Grund inhärenter Defizite schlug der Versuch, MGS für die Verbesserung poligener Merkmale zu verwenden, fehl. Mit einem neuen Ansatz, genomische Selektion genannt und für die Milchrinderzüchtung entwickelt, gelang es, die Defizite der MGS zu überwinden, indem der Schwerpunkt weg von der Identifikation von QTL und hin zur Vorhersage von genetischen Werten gelegt wurde. Für die genomische Selektion wird mit Hilfe einer Kalibrierungspopulation, bestehend aus phenotypisierten und genotypisierten Individuen, ein Vorhersagemodell erstellt. Für dieses Modell wird die Information aller molekularer Marker simultan verwendet. Mit Hilfe des Vorhersagemodells werden anschließend die genetischen Werte der Selektionskandidaten, die nur genotypisiert wurden, vorhergesagt. Aufgrund ihres Erfolges revolutionierte die genomische Selektion bereits die Milchrinderzüchtung. Pflanzenzüchtung und Milchrinder-züchtung unterscheiden sich aber in grundlegenden Aspekten. Auf Grund dessen war es notwendig, die Methodik zu erweitern, um die genomische Selektion für die in der Pflanzenzüchtung typischen Szenarien einsetzen zu können. Einfachkreuzungen sind der dominierende Sortentyp in Mais und vielen anderen Kulturen. Um überlegene Hybriden zu identifizieren, ist die Vorhersage der Hybridleistung von zentraler Bedeutung. Der Einsatz von genomischen Vorhersageverfahren ist daher von großem Interesse für die Pflanzenzüchtung. Die herkömlichen genomischen Vorhersagemodelle schätzen nur einen einzigen, additive Effekt pro Marker. Aus zwei Gründen war dies nicht adäquat für die Vorhersage der Hybridleistung. (1) Die Elternlinien einer Hybride entstammen üblicherweise genetisch sehr verschiedenen Genpools, auch heterotische Gruppen genannt. In der Maishybridzüchtung in Mitteleuropa, sind dies zum Beispiel der Dent- und Flintpool, die seit mindestens 500 Jahren getrennt sind. Wegen dieser ausgeprägten Divergenz schien es notwendig, spezifische Markereffekte für jede heterotische Gruppe zu schätzen. (2) Dominanzeffekte sind eine wesentliche Komponente der Hybridleistung. Sie mussten daher in die Vorhersagemodelle aufgenommen werden, um die genetische Varianz zwischen den Hybriden so vollständig wie möglich zu erfassen. Die Verwendung verschiedener heterotischer Gruppen in der Hybridzüchtung erfordert es, für die Linienentwicklung innerhalb der heterotischer Gruppen, parallele Zuchtprogramme zu unterhalten. Es wurde allerdings noch nicht versucht, die Größe der Kalibrierungspopulation mit Linien der jeweils anderen heterotischen Gruppe zu erhöhen. Ein weiteres Ziel dieser Dissertation war es deshalb, zu untersuchen, ob die Vereinigung verschiedener heterotischer Gruppen in einer Kalibrierungspopulation zu einer Erhöhung der Vorhersagegenauigkeit führen kann. Einige für die Pflanzenzüchtung wichtige Merkmale sind dadurch gekennzeichnet, dass die phenotypischen Daten einer Binomialverteilung folgen. Beispiele dafür sind Keim-, Fruchtbarkeits- und Haploideninduktionsraten. Da für diese Art von Merkmal bisher keine Vorhersagemethodik zur Verfügung stand, sollte diese in der vorliegenden Arbeit entwickelt werden. Unsere Ergebnisse zeigten, dass die Schätzung von Dominanzeffekten die Genauigkeit der vorhergesagten Hybridleistung deutlich erhöhen konnte, wenn die Dominanzvarianz einen wesentlichen Anteil an der gesamten genetischen Varianz darstellt. Bei hohen Markerdichten machte es kaum einen Unterschied, ob für heterotische Gruppen spezifische Markereffekte geschätzt wurden. Der Hauptgrund dafür war die überraschend hohe Übereinstimmung in den Kopplungsphasen der heterotischen Gruppen Dent und Flint. Des weiteren zeigten wir, dass die Vereinigung von Linien aus Dent und Flint in einer einzigen Kalibrierungspopulation zu einer beträchtlichen Steigerung der Vorhersagegenauigkeit führen kann. Unsere Erweiterungen der Vorhersagemethodik auf binomialverteilte Daten erzielten im Vergleich zu approximativen Methoden eine deutlich höhere Vorhersagegenauigkeit. Insgesamt zeigen die erzielten Ergebnisse, dass die in dieser Dissertation entwickelten Erweiterungen der Vorhersagemethoden (für Vorhersage der Hybridleistung und für binomialverteilte Daten) und -ansätze (Vereinigung von heterotischen Gruppen in einer Kalibrierungspopulation), zu einer beträchtlichen, kostenfreien Erhöhung der Vorhersagegenauigkeit in der genomischen Selektion im pflanzenzüchterischen Kontext führen können. Sie stellen daher ein wertvolles Mittel dar, um das Potential der genomischen Selektion in der Pflanzenzüchtung voll auszuschöpfen

    Effectiveness of Genomic Prediction of Maize Hybrid Performance in Different Breeding Populations and Environments

    Get PDF
    Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F(2)-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F(2)-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set

    Use of F2 Bulks in Training Sets for Genomic Prediction of Combining Ability and Hybrid Performance

    No full text
    Developing training sets for genomic prediction in hybrid crops requires producing hybrid seed for a large number of entries. In autogamous crop species (e.g., wheat, rice, rapeseed, cotton) this requires elaborate hybridization systems to prevent self-pollination and presents a significant impediment to the implementation of hybrid breeding in general and genomic selection in particular. An alternative to F1 hybrids are bulks of F2 seed from selfed F1 plants (F1:2). Seed production for F1:2 bulks requires no hybridization system because the number of F1 plants needed for producing enough F1:2 seed for multi-environment testing can be generated by hand-pollination. This study evaluated the suitability of F1:2 bulks for use in training sets for genomic prediction of F1 level general combining ability and hybrid performance, under different degrees of divergence between heterotic groups and modes of gene action, using quantitative genetic theory and simulation of a genomic prediction experiment. The simulation, backed by theory, showed that F1:2 training sets are expected to have a lower prediction accuracy relative to F1 training sets, particularly when heterotic groups have strongly diverged. The accuracy penalty, however, was only modest and mostly because of a lower heritability, rather than because of a difference in F1 and F1:2 genetic values. It is concluded that resorting to F1:2 bulks is, in theory at least, a promising approach to remove the significant complication of a hybridization system from the breeding process

    Parent-progeny imputation from pooled samples for cost-efficient genotyping in plant breeding

    No full text
    <div><p>The increased usage of whole-genome selection (WGS) and other molecular evaluation methods in plant breeding relies on the ability to genotype a very large number of untested individuals in each breeding cycle. Many plant breeding programs evaluate large biparental populations of homozygous individuals derived from homozygous parent inbred lines. This structure lends itself to parent-progeny imputation, which transfers the genotype scores of the parents to progeny individuals that are genotyped for a much smaller number of loci. Here we introduce a parent-progeny imputation method that infers individual genotypes from non-barcoded pooled samples of DNA of multiple individuals using a Hidden Markov Model (HMM). We demonstrate the method for pools of simulated maize double haploids (DH) from biparental populations, genotyped using a genotyping by sequencing (GBS) approach for 3,000 loci at 0.125<i>x</i> to 4<i>x</i> coverage. We observed high concordance between true and imputed marker scores and the HMM produced well-calibrated genotype probabilities that correctly reflected the uncertainty of the imputed scores. Genomic estimated breeding values (GEBV) calculated from the imputed scores closely matched GEBV calculated from the true marker scores. The within-population correlation between these sets of GEBV approached 0.95 at 1<i>x</i> and 4<i>x</i> coverage when pooling two or four individuals, respectively. Our approach can reduce the genotyping cost per individual by a factor up to the number of pooled individuals in GBS applications without the need for extra sequencing coverage, thereby enabling cost-effective large scale genotyping for applications such as WGS in plant breeding.</p></div

    Schematic visualization of parent-progeny imputation from pooled samples.

    No full text
    <p>Parent-progeny imputation is carried out for four genetically linked loci L<sub>1</sub>, L<sub>2</sub>, L<sub>3</sub> and L<sub>4</sub> for a DNA pool of two DH individuals (P<sub>1</sub> and P<sub>2</sub>) from two biparental populations (I<sub>1</sub> × I<sub>2</sub> and I<sub>3</sub> × I<sub>4</sub>).</p

    Expected genotype call probabilities (%) for dent-dent pools.

    No full text
    <p>Expected genotype call probabilities (%) for dent-dent pools.</p

    Mean and standard deviation (sd) of the multi-polymorphism rate and its correlation with genotype and GEBV concordance.

    No full text
    <p>Mean and standard deviation (sd) of the multi-polymorphism rate and its correlation with genotype and GEBV concordance.</p
    corecore