122 research outputs found
Phenomics data processing: A plot-level model for repeated measurements to extract the timing of key stages and quantities at defined time points
Decision-making in breeding increasingly depends on the ability to capture and predict crop responses to changing environmental factors. Advances in crop modeling as well as high-throughput eld phenotyping (HTFP) hold promise to provide such insights. Processing HTFP data is an interdisciplinary task that requires broad knowledge on experimental design, measurement techniques, feature extraction, dynamic trait modeling, and prediction of genotypic values using statistical models. To get an overview of sources of variation in HTFP, we develop a general plot-level model for repeated measurements. Based on this model, we propose a seamless step-wise procedure that allows for carry on of estimated means and variances from stage to stage. The process builds on the extraction of three intermediate trait categories; (1) timing of key stages, (2) quantities at de ned time points or periods, and (3) dose-response curves. In a rst stage, these intermediate traits are extracted from low-level traits’ time series (e.g., canopy height) using P-splines and the quarter of maximum elongation rate method (QMER), as well as nal height percentiles. In a second and third stage, extracted traits are further processed using a stage-wise linear mixed model analysis. Using a wheat canopy growth simulation to generate canopy height time series, we demonstrate the suitability of the stage-wise process for traits of the rst two above-mentioned categories. Results indicate that, for the rst stage, the P-spline/QMER method was more robust than the percentile method. In the subsequent two-stage linear mixed model processing, weighting the second and third stage with error variance estimates from the previous stages improved the root mean squared error. We conclude that processing phenomics data in stages represents a feasible approach if estimated means and variances are carried forward from one processing stage to the next. P-splines in combination with the QMER method are suitable tools to extract timing of key stages and quantities at de ned time points from HTFP data
Recommended from our members
Determinants of barley grain yield in drought-prone Mediterranean environments
The determinants of barley grain yield in drought-prone Mediterranean environments have been studied in the Nure x Tremois (NT) population. A large set of yield and other morpho-physiological data were recorded in 118 doubled haploid (DH) lines of the population, in multi-environment field trials (18 site-year combination). Agrometeorological variables have been recorded and calculated at each site too. Four main periods of barley development were considered, vegetative, reproductive early and late grain filling phases, to dissect the effect on yield traits of the growth phases. Relationships between agrometeorological variables, grain yield (GY) and its main components (GN and GW) were also investigated by correlation. Results firstly gave a clear indication of the involvement of water consumption in determining GY and GW (r2=0.616, P=0.007 and r2=0.703, P=0.005, respectively) calculated from sowing to the early grain filling period, while GN showed its highest correlation with the total photothermal quotient (PQ) calculated for the same period (r2=0.646, P=0.013). With the only exception of total PQ calculated during the vegetative period, all significant correlations with GY were associated to water-dependent agrometeorological parameters. As a second result, the NT segregating population allowed us to weight the amount of interaction due to genotypes over environments or to environments in relation to genotypes by a GGE analysis; 47.67% of G+GE sum of squares was explained by the first two principal components. Then, the introduction of genomic information at major barley genes regulating the length of growth cycle allowed us to explain patterns of adaptation of different groups of NT lines according to the variants (alleles) harbored at venalization (Vrn-H1) in combination with earliness (Eam6) genes. The superiority of the lines carrying the Nure allele at Eam6 was confirmed by factorial ANOVA testing the four possible haplotypes obtained combining alternative alleles at Eam6 and Vrn-H1. Maximum yield potential and differentials among the NT genotypes was finally explored through Finlay-Wilkinson model to interpret grain yield of NT genotypes together with yield adaptability (Ya), as the regression coefficient bi; Ya ranged from 0.71 for NT77 to 1.20 for NT19. Lines simply harboring the Nure variants at the two genes behaved as highest yielding (3.04 t ha-1), and showed the highest yield adaptability (bi=1.05). The present study constitutes a starting point towards the introduction of genomic variables in agronomic models for barley grain yield in Mediterranean environments
Interpreting genotype × environment interaction in tropical maize using linked molecular markers and environmental covariables
An understanding of the genetic and environmental
basis of genotype´environment interaction (GEI)
is of fundamental importance in plant breeding. In mapping
quantitative trait loci (QTLs), suitable genetic populations
are grown in different environments causing
QTLs´environment interaction (QEI). The main objective
of the present study is to show how Partial Least
Squares (PLS) regression and Factorial Regression (FR)
models using genetic markers and environmental covariables
can be used for studying QEI related to GEI. Biomass
data were analyzed from a multi-environment trial
consisting of 161 lines from a F3:4 maize segregating
population originally created with the purpose of mapping
QTLs loci and investigating adaptation differences
between highland and lowland tropical maize. PLS and
FR methods detected 30 genetic markers (out of 86) that
explained a sizeable proportion of the interaction of
maize lines over four contrasting environments involving
two low-altitude sites, one intermediate-altitude site, and
one high-altitude site for biomass production. Based on a
previous study, most of the 30 markers were associated
with QTLs for biomass and exhibited significant QEI. It
was found that marker loci in lines with positive GEI for
the highland environments contained more highland alleles,
whereas marker loci in lines with positive GEI for
intermediate and lowland environments contained more
lowland alleles. In addition, PLS and FR models identified maximum temperature as the most-important environmental
covariable for GEI. Using a stepwise variable
selection procedure, a FR model was constructed for
GEI and QEI that exclusively included cross products
between genetic markers and environmental covariables.
Higher maximum temperature in low- and intermediatealtitude
sites affected the expression of some QTLs,
while minimum temperature affected the expression of
other QTLs
Dynamics of senescence-related QTLs in potato
The study of quantitative trait's expression over time helps to understand developmental processes which occur in the course of the growing season. Temperature and other environmental factors play an important role. The dynamics of haulm senescence was observed in a diploid potato mapping population in two consecutive years (2004 and 2005) under field conditions in Finland. The available time series data were used in a smoothed generalized linear model to characterize curves describing the senescence development in terms of its onset, mean and maximum progression rate and inflection point. These characteristics together with the individual time points were used in a Quantitative trait loci (QTL) analysis. Although QTLs occurring early in the sene
Mixed model association scans of multi-environmental trial data reveal major loci controlling yield and yield related traits in Hordeum vulgare in Mediterranean environments
An association panel consisting of 185 accessions representative of the barley germplasm cultivated in the Mediterranean basin was used to localise quantitative trait loci (QTL) controlling grain yield and yield related traits. The germplasm set was genotyped with 1,536 SNP markers and tested for associations with phenotypic data gathered over 2 years for a total of 24 year × location combinations under a broad range of environmental conditions. Analysis of multi-environmental trial (MET) data by fitting a mixed model with kinship estimates detected from two to seven QTL for the major components of yield including 1000 kernel weight, grains per spike and spikes per m2, as well as heading date, harvest index and plant height. Several of the associations involved SNPs tightly linked to known major genes determining spike morphology in barley (vrs1 and int-c). Similarly, the largest QTL for heading date co-locates with SNPs linked with eam6, a major locus for heading date in barley for autumn sown conditions. Co-localization of several QTL related to yield components traits suggest that major developmental loci may be linked to most of the associations. This study highlights the potential of association genetics to identify genetic variants controlling complex traits
Gene Regulatory Networks from Multifactorial Perturbations Using Graphical Lasso: Application to the DREAM4 Challenge
A major challenge in the field of systems biology consists of predicting gene regulatory networks based on different training data. Within the DREAM4 initiative, we took part in the multifactorial sub-challenge that aimed to predict gene regulatory networks of size 100 from training data consisting of steady-state levels obtained after applying multifactorial perturbations to the original in silico network
Estimating maize genetic erosion in modernized smallholder agriculture
Replacement of crop landraces by modern varieties is thought to cause diversity loss. We studied genetic erosion in maize within a model system; modernized smallholder agriculture in southern Mexico. The local seed supply was described through interviews and in situ seed collection. In spite of the dominance of commercial seed, the informal seed system was found to persist. True landraces were rare and most informal seed was derived from modern varieties (creolized). Seed lots were characterized for agronomical traits and molecular markers. We avoided the problem of non-consistent nomenclature by taking individual seed lots as the basis for diversity inference. We defined diversity as the weighted average distance between seed lots. Diversity was calculated for subsets of the seed supply to assess the impact of replacing traditional landraces with any of these subsets. Results were different for molecular markers, ear- and vegetative/flowering traits. Nonetheless, creolized varieties showed low diversity for all traits. These varieties were distinct from traditional landraces and little differentiated from their ancestral stocks. Although adoption of creolized maize into the informal seed system has lowered diversity as compared to traditional landraces, genetic erosion was moderated by the distinct features offered by modern varieties
Gene and QTL detection in a three-way barley cross under selection by a mixed model with kinship information using SNPs
Quantitative trait locus (QTL) detection is commonly performed by analysis of designed segregating populations derived from two inbred parental lines, where absence of selection, mutation and genetic drift is assumed. Even for designed populations, selection cannot always be avoided, with as consequence varying correlation between genotypes instead of uniform correlation. Akin to linkage disequilibrium mapping, ignoring this type of genetic relatedness will increase the rate of false-positives. In this paper, we advocate using mixed models including genetic relatedness, or ‘kinship’ information for QTL detection in populations where selection forces operated. We demonstrate our case with a three-way barley cross, designed to segregate for dwarfing, vernalization and spike morphology genes, in which selection occurred. The population of 161 inbred lines was screened with 1,536 single nucleotide polymorphisms (SNPs), and used for gene and QTL detection. The coefficient of coancestry matrix was estimated based on the SNPs and imposed to structure the distribution of random genotypic effects. The model incorporating kinship, coancestry, information was consistently superior to the one without kinship (according to the Akaike information criterion). We show, for three traits, that ignoring the coancestry information results in an unrealistically high number of marker–trait associations, without providing clear conclusions about QTL locations. We used a number of widely recognized dwarfing and vernalization genes known to segregate in the studied population as landmarks or references to assess the agreement of the mapping results with a priori candidate gene expectations. Additional QTLs to the major genes were detected for all traits as well
Constraint-based probabilistic learning of metabolic pathways from tomato volatiles
Clustering and correlation analysis techniques have become popular tools for the analysis of data produced by metabolomics experiments. The results obtained from these approaches provide an overview of the interactions between objects of interest. Often in these experiments, one is more interested in information about the nature of these relationships, e.g., cause-effect relationships, than in the actual strength of the interactions. Finding such relationships is of crucial importance as most biological processes can only be understood in this way. Bayesian networks allow representation of these cause-effect relationships among variables of interest in terms of whether and how they influence each other given that a third, possibly empty, group of variables is known. This technique also allows the incorporation of prior knowledge as established from the literature or from biologists. The representation as a directed graph of these relationship is highly intuitive and helps to understand these processes. This paper describes how constraint-based Bayesian networks can be applied to metabolomics data and can be used to uncover the important pathways which play a significant role in the ripening of fresh tomatoes. We also show here how this methods of reconstructing pathways is intuitive and performs better than classical techniques. Methods for learning Bayesian network models are powerful tools for the analysis of data of the magnitude as generated by metabolomics experiments. It allows one to model cause-effect relationships and helps in understanding the underlying processes
- …