81 research outputs found
FitTetra 2.0-improved genotype calling for tetraploids with multiple population and parental data support
BackgroundGenetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data.ResultsOur upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated.ConclusionOur updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.</p
QTLMAS 2009: simulated dataset
Background - The simulation of the data for the QTLMAS 2009 Workshop is described. Objective was to simulate observations from a growth curve which was influenced by a number of QTL. Results - The data consisted of markers, phenotypes and pedigree. Genotypes of 453 markers, distributed over 5 chromosomes of 1 Morgan each, were simulated for 2,025 individuals. From those, 25 individuals were parents of the other 2,000 individuals. The 25 parents were genetically related. Phenotypes were simulated according to a logistic growth curve and were made available for 1,000 of the 2,000 offspring individuals. The logistic growth curve was specified by three parameters. Each parameter was influenced by six Quantitative Trait Loci (QTL), positioned at the five chromosomes. For each parameter, one QTL had a large effect and five QTL had small effects. Variance of large QTL was five times the variance of small QTL. Simulated data was made available at http://www.qtlmas2009.wur.nl/UK/Dataset
Comparison of analyses of the QTLMAS XIII common dataset. I: genomic selection
Background - Genomic selection, the use of markers across the whole genome, receives increasing amounts of attention and is having more and more impact on breeding programs. Development of statistical and computational methods to estimate breeding values based on markers is a very active area of research. A simulated dataset was analyzed by participants of the QTLMAS XIII workshop, allowing a comparison of the ability of different methods to estimate genomic breeding values. Methods - A best case scenario was analyzed by the organizers where QTL genotypes were known. Participants submitted estimated breeding values for 1000 unphenotyped individuals together with a description of the applied method(s). The submitted breeding values were evaluated for correlation with the simulated values (accuracy), rank correlation of the best 10% of individuals and error in predictions. Bias was tested by regression of simulated on estimated breeding values. Results - The accuracy obtained from the best case scenario was 0.94. Six research groups submitted 19 sets of estimated breeding values. Methods that assumed the same variance for markers showed accuracies, measured as correlations between estimated and simulated values, ranging from 0.75 to 0.89 and rank correlations between 0.58 and 0.70. Methods that allowed different marker variances showed accuracies ranging from 0.86 to 0.94 and rank correlations between 0.69 and 0.82. Methods assuming equal marker variances were generally more biased and showed larger prediction errors. Conclusions - The best performing methods achieved very high accuracies, close to accuracies achieved in a best case scenario where QTL genotypes were known without error. Methods that allowed different marker variances generally outperformed methods that assumed equal marker variances. Genomic selection methods performed well compared to traditional, pedigree only, methods; all methods showed higher accuracies than those obtained for breeding values estimated solely on pedigree relationship
The patterns of population differentiation in a Brassica rapa core collection
With the recent advances in high throughput profiling techniques the amount of genetic and phenotypic data available has increased dramatically. Although many genetic diversity studies combine morphological and genetic data, metabolite profiling has yet to be integrated into these studies. For our study we selected 168 accessions representing the different morphotypes and geographic origins of Brassica rapa. Metabolite profiling was performed on all plants of this collection in the youngest expanded leaves, 5 weeks after transplanting and the same material was used for molecular marker profiling. During the same season a year later, 26 morphological characteristics were measured on plants that had been vernalized in the seedling stage. The number of groups and composition following a hierarchical clustering with molecular markers was highly correlated to the groups based on morphological traits (r = 0.420) and metabolic profiles (r = 0.476). To reveal the admixture levels in B. rapa, comparison with the results of the programme STRUCTURE was needed to obtain information on population substructure. To analyze 5546 metabolite (LC–MS) signals the groups identified with STRUCTURE were used for random forests classification. When comparing the random forests and STRUCTURE membership probabilities 86% of the accessions were allocated into the same subgroup. Our findings indicate that if extensive phenotypic data (metabolites) are available, classification based on this type of data is very comparable to genetic classification. These multivariate types of data and methodological approaches are valuable for the selection of accessions to study the genetics of selected traits and for genetic improvement programs, and additionally provide information on the evolution of the different morphotypes in B. rapa. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-010-1516-1) contains supplementary material, which is available to authorized users
Comparative Methods for Association Studies: A Case Study on Metabolite Variation in a Brassica rapa Core Collection
Background Association mapping is a statistical approach combining phenotypic traits and genetic diversity in natural populations with the goal of correlating the variation present at phenotypic and allelic levels. It is essential to separate the true effect of genetic variation from other confounding factors, such as adaptation to different uses and geographical locations. The rapid availability of large datasets makes it necessary to explore statistical methods that can be computationally less intensive and more flexible for data exploration. Methodology/Principal Findings A core collection of 168 Brassica rapa accessions of different morphotypes and origins was explored to find genetic association between markers and metabolites: tocopherols, carotenoids, chlorophylls and folate. A widely used linear model with modifications to account for population structure and kinship was followed for association mapping. In addition, a machine learning algorithm called Random Forest (RF) was used as a comparison. Comparison of results across methods resulted in the selection of a set of significant markers as promising candidates for further work. This set of markers associated to the metabolites can potentially be applied for the selection of genotypes with elevated levels of these metabolites. Conclusions/Significance The incorporation of the kinship correction into the association model did not reduce the number of significantly associated markers. However incorporation of the STRUCTURE correction (Q matrix) in the linear regression model greatly reduced the number of significantly associated markers. Additionally, our results demonstrate that RF is an interesting complementary method with added value in association studies in plants, which is illustrated by the overlap in markers identified using RF and a linear mixed model with correction for kinship and population structure. Several markers that were selected in RF and in the models with correction for kinship, but not for population structure, were also identified as QTLs in two bi-parental DH populations
Family-Based Haplotype Estimation and Allele Dosage Correction for Polyploids Using Short Sequence Reads
DNA sequence reads contain information about the genomic variants located on a single chromosome. By extracting and extending this information using the overlaps between the reads, the haplotypes of an individual can be obtained. Using parent-offspring relationships in a population can considerably improve the quality of the haplotypes obtained from short reads, as pedigree information can be used to correct for spurious overlaps (due to sequencing errors) and insufficient overlaps (due to short read lengths, low genomic variation and shallow coverage). We developed a novel method, PopPoly, to estimate polyploid haplotypes in an F1-population from short sequence data by taking into consideration the transmission of the haplotypes from the parents to the offspring. In addition, this information is employed to improve genotype dosage estimation and to call missing genotypes in the population. Through simulations, we compare PopPoly to other haplotyping methods and show its better performance. We evaluate PopPoly by applying it to a tetraploid potato cross at nine genomic regions involved in tuber formation
Genetic complexity of miscanthus cell wall composition and biomass quality for biofuels
BACKGROUND: Miscanthus sinensis is a high yielding perennial grass species with great potential as a bioenergy feedstock. One of the challenges that currently impedes commercial cellulosic biofuel production is the technical difficulty to efficiently convert lignocellulosic biomass into biofuel. The development of feedstocks with better biomass quality will improve conversion efficiency and the sustainability of the value-chain. Progress in the genetic improvement of biomass quality may be substantially expedited by the development of genetic markers associated to quality traits, which can be used in a marker-assisted selection program. RESULTS: To this end, a mapping population was developed by crossing two parents of contrasting cell wall composition. The performance of 182 F1 offspring individuals along with the parents was evaluated in a field trial with a randomized block design with three replicates. Plants were phenotyped for cell wall composition and conversion efficiency characters in the second and third growth season after establishment. A new SNP-based genetic map for M. sinensis was built using a genotyping-by-sequencing (GBS) approach, which resulted in 464 short-sequence uniparental markers that formed 16 linkage groups in the male map and 17 linkage groups in the female map. A total of 86 QTLs for a variety of biomass quality characteristics were identified, 20 of which were detected in both growth seasons. Twenty QTLs were directly associated to different conversion efficiency characters. Marker sequences were aligned to the sorghum reference genome to facilitate cross-species comparisons. Analyses revealed that for some traits previously identified QTLs in sorghum occurred in homologous regions on the same chromosome. CONCLUSION: In this work we report for the first time the genetic mapping of cell wall composition and bioconversion traits in the bioenergy crop miscanthus. These results are a first step towards the development of marker-assisted selection programs in miscanthus to improve biomass quality and facilitate its use as feedstock for biofuel production
Composition of Human Skin Microbiota Affects Attractiveness to Malaria Mosquitoes
The African malaria mosquito Anopheles gambiae sensu stricto continues to play an important role in malaria transmission, which is aggravated by its high degree of anthropophily, making it among the foremost vectors of this disease. In the current study we set out to unravel the strong association between this mosquito species and human beings, as it is determined by odorant cues derived from the human skin. Microbial communities on the skin play key roles in the production of human body odour. We demonstrate that the composition of the skin microbiota affects the degree of attractiveness of human beings to this mosquito species. Bacterial plate counts and 16S rRNA sequencing revealed that individuals that are highly attractive to An. gambiae s.s. have a significantly higher abundance, but lower diversity of bacteria on their skin than individuals that are poorly attractive. Bacterial genera that are correlated with the relative degree of attractiveness to mosquitoes were identified. The discovery of the connection between skin microbial populations and attractiveness to mosquitoes may lead to the development of new mosquito attractants and personalized methods for protection against vectors of malaria and other infectious diseases
- …