72 research outputs found
A method for sensitivity analysis to assess the effects of measurement error in multiple exposure variables using external validation data
Measurement error in self-reported dietary intakes is known to bias the association between dietary intake and a health outcome of interest such as risk of a disease. The association can be distorted further by mismeasured confounders, leading to invalid results and conclusions. It is, however, difficult to adjust for the bias in the association when there is no internal validation data
Homoplasy corrected estimation of genetic similarity from AFLP bands, and the effect of the number of bands on the precision of estimation
AFLP is a DNA fingerprinting technique, resulting in binary band presence–absence patterns, called profiles, with known or unknown band positions. We model AFLP as a sampling procedure of fragments, with lengths sampled from a distribution. Bands represent fragments of specific lengths. We focus on estimation of pairwise genetic similarity, defined as average fraction of common fragments, by AFLP. Usual estimators are Dice (D) or Jaccard coefficients. D overestimates genetic similarity, since identical bands in profile pairs may correspond to different fragments (homoplasy). Another complicating factor is the occurrence of different fragments of equal length within a profile, appearing as a single band, which we call collision. The bias of D increases with larger numbers of bands, and lower genetic similarity. We propose two homoplasy- and collision-corrected estimators of genetic similarity. The first is a modification of D, replacing band counts by estimated fragment counts. The second is a maximum likelihood estimator, only applicable if band positions are available. Properties of the estimators are studied by simulation. Standard errors and confidence intervals for the first are obtained by bootstrapping, and for the second by likelihood theory. The estimators are nearly unbiased, and have for most practical cases smaller standard error than D. The likelihood-based estimator generally gives the highest precision. The relationship between fragment counts and precision is studied using simulation. The usual range of band counts (50–100) appears nearly optimal. The methodology is illustrated using data from a phylogenetic study on lettuce
Estimation of metabolite networks with regard to a specific covariable: applications to plant and human data
In systems biology, where a main goal is acquiring knowledge of biological systems, one of the challenges is inferring biochemical interactions from different molecular entities such as metabolites. In this area, the metabolome possesses a unique place for reflecting “true exposure” by being sensitive to variation coming from genetics, time, and environmental stimuli. While influenced by many different reactions, often the research interest needs to be focused on variation coming from a certain source, i.e. a certain covariable Xm . Objective Here, we use network analysis methods to recover a set of metabolite relationships, by finding metabolites sharing a similar relation to Xm . Metabolite values are based on information coming from individuals’ Xm status which might interact with other covariables. Methods Alternative to using the original metabolite values, the total information is decomposed by utilizing a linear regression model and the part relevant to Xm is further used. For two datasets, two different network estimation methods are considered. The first is weighted gene co-expression network analysis based on correlation coefficients. The second method is graphical LASSO based on partial correlations. Results We observed that when using the parts related to the specific covariable of interest, resulting estimated networks display higher interconnectedness. Additionally, several groups of biologically associated metabolites (very large density lipoproteins, lipoproteins, etc.) were identified in the human data example. Conclusions This work demonstrates how information on the study design can be incorporated to estimate metabolite networks. As a result, sets of interconnected metabolites can be clustered together with respect to their relation to a covariable of interest
Mixed model association scans of multi-environmental trial data reveal major loci controlling yield and yield related traits in Hordeum vulgare in Mediterranean environments
An association panel consisting of 185 accessions representative of the barley germplasm cultivated in the Mediterranean basin was used to localise quantitative trait loci (QTL) controlling grain yield and yield related traits. The germplasm set was genotyped with 1,536 SNP markers and tested for associations with phenotypic data gathered over 2 years for a total of 24 year × location combinations under a broad range of environmental conditions. Analysis of multi-environmental trial (MET) data by fitting a mixed model with kinship estimates detected from two to seven QTL for the major components of yield including 1000 kernel weight, grains per spike and spikes per m2, as well as heading date, harvest index and plant height. Several of the associations involved SNPs tightly linked to known major genes determining spike morphology in barley (vrs1 and int-c). Similarly, the largest QTL for heading date co-locates with SNPs linked with eam6, a major locus for heading date in barley for autumn sown conditions. Co-localization of several QTL related to yield components traits suggest that major developmental loci may be linked to most of the associations. This study highlights the potential of association genetics to identify genetic variants controlling complex traits
QTL linkage analysis of connected populations using ancestral marker and pedigree information
The common assumption in quantitative trait locus (QTL) linkage mapping studies that parents of multiple connected populations are unrelated is unrealistic for many plant breeding programs. We remove this assumption and propose a Bayesian approach that clusters the alleles of the parents of the current mapping populations from locus-specific identity by descent (IBD) matrices that capture ancestral marker and pedigree information. Moreover, we demonstrate how the parental IBD data can be incorporated into a QTL linkage analysis framework by using two approaches: a Threshold IBD model (TIBD) and a Latent Ancestral Allele Model (LAAM). The TIBD and LAAM models are empirically tested via numerical simulation based on the structure of a commercial maize breeding program. The simulations included a pilot dataset with closely linked QTL on a single linkage group and 100 replicated datasets with five linkage groups harboring four unlinked QTL. The simulation results show that including parental IBD data (similarly for TIBD and LAAM) significantly improves the power and particularly accuracy of QTL mapping, e.g., position, effect size and individuals’ genotype probability without significantly increasing computational demand
Statistical epistasis between candidate gene alleles for complex tuber traits in an association mapping population of tetraploid potato
Association mapping using DNA-based markers is a novel tool in plant genetics for the analysis of complex traits. Potato tuber yield, starch content, starch yield and chip color are complex traits of agronomic relevance, for which carbohydrate metabolism plays an important role. At the functional level, the genes and biochemical pathways involved in carbohydrate metabolism are among the best studied in plants. Quantitative traits such as tuber starch and sugar content are therefore models for association genetics in potato based on candidate genes. In an association mapping experiment conducted with a population of 243 tetraploid potato varieties and breeding clones, we previously identified associations between individual candidate gene alleles and tuber starch content, starch yield and chip quality. In the present paper, we tested 190 DNA markers at 36 loci scored in the same association mapping population for pairwise statistical epistatic interactions. Fifty marker pairs were associated mainly with tuber starch content and/or starch yield, at a cut-off value of q ≤ 0.20 for the experiment-wide false discovery rate (FDR). Thirteen marker pairs had an FDR of q ≤ 0.10. Alleles at loci encoding ribulose-bisphosphate carboxylase/oxygenase activase (Rca), sucrose phosphate synthase (Sps) and vacuolar invertase (Pain1) were most frequently involved in statistical epistatic interactions. The largest effect on tuber starch content and starch yield was observed for the paired alleles Pain1-8c and Rca-1a, explaining 9 and 10% of the total variance, respectively. The combination of these two alleles increased the means of tuber starch content and starch yield. Biological models to explain the observed statistical epistatic interactions are discussed
A mixed model QTL analysis for sugarcane multiple-harvest-location trial data
Sugarcane-breeding programs take at least 12 years to develop new commercial cultivars. Molecular markers offer a possibility to study the genetic architecture of quantitative traits in sugarcane, and they may be used in marker-assisted selection to speed up artificial selection. Although the performance of sugarcane progenies in breeding programs are commonly evaluated across a range of locations and harvest years, many of the QTL detection methods ignore two- and three-way interactions between QTL, harvest, and location. In this work, a strategy for QTL detection in multi-harvest-location trial data, based on interval mapping and mixed models, is proposed and applied to map QTL effects on a segregating progeny from a biparental cross of pre-commercial Brazilian cultivars, evaluated at two locations and three consecutive harvest years for cane yield (tonnes per hectare), sugar yield (tonnes per hectare), fiber percent, and sucrose content. In the mixed model, we have included appropriate (co)variance structures for modeling heterogeneity and correlation of genetic effects and non-genetic residual effects. Forty-six QTLs were found: 13 QTLs for cane yield, 14 for sugar yield, 11 for fiber percent, and 8 for sucrose content. In addition, QTL by harvest, QTL by location, and QTL by harvest by location interaction effects were significant for all evaluated traits (30 QTLs showed some interaction, and 16 none). Our results contribute to a better understanding of the genetic architecture of complex traits related to biomass production and sucrose content in sugarcane
- …