    Association mapping in tetraploid potato

    The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency. In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes. In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits. These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well. To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context. In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning. In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed. <br/

    Molecular genetics of chicken egg quality

    Faultless quality in eggs is important in all production steps, from chicken to packaging, transportation, storage, and finally to the consumer. The egg industry (specifically transportation and packing) is interested in robustness, the consumer in safety and taste, and the chicken itself in the reproductive performance of the egg. High quality is commercially profitable, and egg quality is currently one of the key traits in breeding goals. In conventional breeding schemes, the more traits that are included in a selection index, the slower the rate of genetic progress for all the traits will be. The unveiling of the genes underlying the traits, and subsequent utilization of this genomic information in practical breeding, would enhance the selection progress, especially with traits of low inheritance, genderconfined traits, or traits which are difficult to assess. In this study, two experimental mapping populations were used to identify quantitative trait loci (QTL) of egg quality traits. A whole genome scan was conducted in both populations with different sets of microsatellite markers. Phenotypic observations of albumen quality, internal inclusions, egg taint, egg shell quality traits, and production traits during the entire production period were collected. To study the presence of QTL, a multiple marker linear regression was used. Polymorphisms found in candidate genes were used as SNP (single nucleotide polymorphism) markers to refine the map position of QTL by linkage and association. Furthermore, independent commercial egg layer lines were utilized to confirm some of the associations. Albumen quality, the incidence of internal inclusions, and egg taint were first mapped with the whole genome scan and fine-mapped with subsequent analyses. In albumen quality, two distinct QTL areas were found on chromosome 2. Vimentin, a gene maintaining the mechanical integrity of the cells, was studied as a candidate gene. Neither sequencing nor subsequent analysis using SNP within the gene in the QTL analysis suggested that variation in this gene could explain the effect on albumen thinning. The same mapping approach was used to study the incidence of internal inclusions, specifically, blood and meat spots. Linkage analysis revealed one genome-wide significant region on chromosome Z. Fine-mapping exposed that the QTL overlapped with a tight junction protein gene ZO-2, and a microsatellite marker inside the gene. Sequencing of a fragment of the gene revealed several SNPs. Two novel SNPs were found to be located in a miRNA (gga-mir-1556) within the ZO-2. MicroRNA-SNP and an exonic synonymous SNP were genotyped in the populations and showed significant association to blood and meat spots. A good congruence between the experimental population and commercial breeds was achieved both in QTL locations and in association results. As a conclusion, ZO-2 and gga-mir-1556 remained candidates for having a role in susceptibility to blood and meat spot defects across populations. This is the first report of QTL affecting blood and meat spot frequency in chicken eggs, albeit the effect explained only 2 % of the phenotypic variance. Fishy taint is a disorder, which is a characteristic of brown layer lines. Marker-trait association analyses of pooled samples indicated that egg-taint and the FMO3 gene map to chicken chromosome 8 and that the variation found by sequencing in the chicken FMO3 gene was associated with the TMA content of the egg. The missense mutation in the FMO3 changes an evolutionary, highly conserved amino acid within the FMO-characteristic motif (FATGY). In conclusion, several QTL regions affecting egg quality traits were successfully detected. Some of the QTL findings, such as albumen quality, remained at the level of wide chromosomal regions. For some QTL, a putative causative gene was indicated: miRNA gga-mir-1556 and/or its host gene ZO-2 might have a role in susceptibility to blood and meat spot defects across populations. Nonetheless, fishy taint in chicken eggs was found to be caused with a substitution within a conserved motif of the FMO3 gene. This variation has been used in a breeding program to eliminate fishy-taint defects from commercial egg layer lines. Objective The objective of this thesis was to map loci affecting economically important egg quality traits in chickens and to increase knowledge of the molecular genetics of these complex traits. The aim was to find markers linked to the egg quality traits, and finally unravel the variation in the genes underlying the phenotypic variation of internal egg quality. QTL mapping methodology was used to identify chromosomal regions affecting various production and egg quality traits (I, III, IV). Three internal egg quality traits were selected for fine-mapping (II, III, IV). Some of the results were verified in independent mapping populations and present-day commercial lines (III, IV). The ultimate objective was to find markers to be applied in commercial selection programs

    Prediction of haplotypes for ungenotyped animals and its effect on marker-assisted breeding value estimation

    Background: In livestock populations, missing genotypes on a large proportion of animals are a major problem to implement the estimation of marker-assisted breeding values using haplotypes. The objective of this article is to develop a method to predict haplotypes of animals that are not genotyped using mixed model equations and to investigate the effect of using these predicted haplotypes on the accuracy of marker-assisted breeding value estimation. Methods: For genotyped animals, haplotypes were determined and for each animal the number of haplotype copies (nhc) was counted, i.e. 0, 1 or 2 copies. In a mixed model framework, nhc for each haplotype were predicted for ungenotyped animals as well as for genotyped animals using the additive genetic relationship matrix. The heritability of nhc was assumed to be 0.99, allowing for minor genotyping and haplotyping errors. The predicted nhc were subsequently used in marker-assisted breeding value estimation by applying random regression on these covariables. To evaluate the method, a population was simulated with one additive QTL and an additive polygenic genetic effect. The QTL was located in the middle of a haplotype based on SNP-markers. Results: The accuracy of predicted haplotype copies for ungenotyped animals ranged between 0.59 and 0.64 depending on haplotype length. Because powerful BLUP-software was used, the method was computationally very efficient. The accuracy of total EBV increased for genotyped animals when marker-assisted breeding value estimation was compared with conventional breeding value estimation, but for ungenotyped animals the increase was marginal unless the heritability was smaller than 0.1. Haplotypes based on four markers yielded the highest accuracies and when only the nearest left marker was used, it yielded the lowest accuracy. The accuracy increased with increasing marker density. Accuracy of the total EBV approached that of gene-assisted BLUP when 4-marker haplotypes were used with a distance of 0.1 cM between the markers. Conclusions: The proposed method is computationally very efficient and suitable for marker-assisted breeding value estimation in large livestock populations including effects of a number of known QTL. Marker-assisted breeding value estimation using predicted haplotypes increases accuracy especially for traits with low heritabilit

    Patterns of genetic diversity and linkage disequilibrium in a highly structured Hordeum vulgare association-mapping population for the Mediterranean basin

    Population structure and genome-wide linkage disequilibrium (LD) were investigated in 192 Hordeum vulgare accessions providing a comprehensive coverage of past and present barley breeding in the Mediterranean basin, using 50 nuclear microsatellite and 1,130 DArT® markers. Both clustering and principal coordinate analyses clearly sub-divided the sample into five distinct groups centred on key ancestors and regions of origin of the germplasm. For given genetic distances, large variation in LD values was observed, ranging from closely linked markers completely at equilibrium to marker pairs at 50 cM separation still showing significant LD. Mean LD values across the whole population sample decayed below r 2 of 0.15 after 3.2 cM. By assaying 1,130 genome-wide DArT® markers, we demonstrated that, after accounting for population substructure, current genome coverage of 1 marker per 1.5 cM except for chromosome 4H with 1 marker per 3.62 cM is sufficient for whole genome association scans. We show, by identifying associations with powdery mildew that map in genomic regions known to have resistance loci, that associations can be detected in strongly stratified samples provided population structure is effectively controlled in the analysis. The population we describe is, therefore, shown to be a valuable resource, which can be used in basic and applied research in barle

    Use of linkage disequilibrium for quantitative trait loci mapping in livestock

    The goal of quantitative trait loci (QTL) mapping in livestock is to find genes underlying traits of economic importance for genetic improvement through marker assisted selection (MAS). The studies presented in this thesis address several important issues in QTL detection and fine mapping using candidate gene analysis and linkage disequilibrium (LD) mapping using high density genotyping. Tests for candidate genes in F2 populations for QTL mapping were developed and evaluated. Results show that the extensive between-breed LD that is present in a cross can result in significant associations for candidate genes at considerable distances from the QTL. Tests that removed the impact of between-breed LD were not powerful in detecting candidate genes closely linked to the QTL, unless the candidate gene was the QTL. Therefore, candidate gene tests in QTL mapping populations must be interpreted with caution. Effectiveness of QTL mapping and MAS using LD in outbred populations depends on the extent of LD between markers and QTL which can differ between populations. Nine measures of LD between multi-allelic markers were evaluated as predictors of usable LD when LD is generated by drift. A standardized chi-square statistic (chi 2\u27) was found to be the best predictor of usable LD of multi-allelic markers with QTL, while three other measures ( c2df , r2 and D*) were found to be good predictors of usable LD of single nucleotide polymorphisms (SNPs) with QTL. The effect of various factors on power and precision of QTL detection was evaluated and power and precision of regression- and identical by descent (IBD)-based LD mapping methods were compared. Power and precision of QTL detection increased with sample size, marker density and QTL effect. *D x Single marker regression had similar or greater power and precision than other regression models. For IBD methods, fitting a 4-SNP haplotype, in general, resulted in relatively high power and the greatest mapping precision among the haplotype sizes. Single marker regression was comparable to the 4-SNP IBD method. The results for the haplotype regression and the IBD method assume that haplotypes are known, which would not be true in practice. This will obviously reduce power of these methods. Thus, for rapid initial screening, QTL can be detected and mapped by regression on SNP genotypes without recovering haplotypes with adequate sample size. LD mapping using high density genotyping in outbred populations is a promising method for QTL detection and fine mapping, and would result in markers that can immediately be implemented for MAS

    netgwas: An R Package for Network-Based Genome-Wide Association Studies

    Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil

    Gene mapping using linkage disequilibrium

    Including copy number variation in association studies to predict genotypic values

    The objective of this study was to investigate, both empirically and deterministically, the ability to explain genetic variation resulting from a copy number polymorphism (CNP) by including the CNP, either by its genotype or by a continuous derivation thereof, alone or together with a nearby single nucleotide polymorphism (SNP) in the model. This continuous measure of a CNP genotype could be a raw hybridization measurement, or a predicted CNP genotype. Results from simulations showed that the linkage disequilibrium (LD) between an SNP and CNP was lower than LD between two SNPs, due to the higher mutation rate at the CNP loci. The model R2 values from analysing the simulated data were very similar to the R2 values predicted with the deterministic formulae. Under the assumption that x copies at a CNP locus lead to the effect of x times the effect of 1 copy, including a continuous measure of a CNP locus in the model together with the genotype of a nearby SNP increased power to explain variation at the CNP locus, even when the continuous measure explained only 15% of the variation at the CNP locus

    Quantitative and population genetic analyses of domesticated and wild sheep populations

    Linkage disequilibrium fine mapping of quantitative trait loci: A simulation study

    Recently, the use of linkage disequilibrium (LD) to locate genes which affect quantitative traits (QTL) has received an increasing interest, but the plausibility of fine mapping using linkage disequilibrium techniques for QTL has not been well studied. The main objectives of this work were to (1) measure the extent and pattern of LD between a putative QTL and nearby markers in finite populations and (2) investigate the usefulness of LD in fine mapping QTL in simulated populations using a dense map of multiallelic or biallelic marker loci. The test of association between a marker and QTL and the power of the test were calculated based on single-marker regression analysis. The results show the presence of substantial linkage disequilibrium with closely linked marker loci after 100 to 200 generations of random mating. Although the power to test the association with a frequent QTL of large effect was satisfactory, the power was low for the QTL with a small effect and/or low frequency. More powerful, multi-locus methods may be required to map low frequent QTL with small genetic effects, as well as combining both linkage and linkage disequilibrium information. The results also showed that multiallelic markers are more useful than biallelic markers to detect linkage disequilibrium and association at an equal distance