6 research outputs found
Benchmarking algorithms for genomic prediction of complex traits
The usefulness of Genomic Prediction (GP) in crop and livestock breeding programs has led to efforts to develop new and improved GP approaches including non-linear algorithm, such as artificial neural networks (ANN) (i.e. deep learning) and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of GP datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and five non-linear algorithms, including ANNs. First, we found that hyperparameter selection was critical for all non-linear algorithms and that feature selection prior to model training was necessary for ANNs when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple GP algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits than that of linear algorithms. Although ANNs did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance near the level of other algorithms. These results, together with the fact that even small improvements in GP performance could accumulate into large genetic gains over the course of a breeding program, highlights the importance of algorithm selection for the prediction of trait value
Perceptions of emerging biotechnologies
Research on public views of biotechnology has centered on genetically modified (GM) foods. However, as the breadth of biotechnology applications grows, a better understanding of public concerns about non-agricultural biotechnology products is needed in order to develop proactive strategies to address these concerns. Here, we explore the perceived benefits and risks associated with five biotechnology products and how those perceptions translate into public opinion about the use and regulation of biotechnology in the United States. While we found greater support for non-agricultural biotechnology product, 70% of individuals surveyed showed no or little variation in their support across the products, indicating opinions about early GM products may be influencing the acceptance of emerging biotechnologies. We identified five common patterns of opinions about biotechnology and used machine learning models to integrate a wide range of factors and predict a respondentâs opinion group. While the model was particularly good at identifying individuals supportive of biotechnology, differentiating between individuals from the non- and conditionally-supportive opinion groups was more challenging, emphasizing the complexity of public opinions of emerging biotechnology products
Putative cis-Regulatory Elements Predict Iron Deficiency Responses in Arabidopsis Roots(1)([OPEN])
Plant iron deficiency (-Fe) activates a complex regulatory network that coordinates root Fe uptake and distribution to sink tissues. In Arabidopsis (Arabidopsis thaliana), FER-LIKE FE DEFICIENCY-INDUCED TRANSCRIPTION FACTOR (FIT), a basic helix-loop-helix (bHLH) transcription factor (TF), regulates root Fe acquisition genes. Many other -Fe-induced genes are FIT independent, and instead regulated by other bHLH TFs and by yet unknown TFs. The cis-regulatory code, that is, the cis-regulatory elements (CREs) and their combinations that regulate plant -Fe-responses, remains largely elusive. Using Arabidopsis root transcriptome data and coexpression clustering, we identified over 100 putative CREs (pCREs) that predicted -Fe-induced gene expression in computational models. To assess pCRE properties and possible functions, we used large-scale in vitro TF binding data, positional bias, and evolutionary conservation. As one example, our approach uncovered pCREs resembling IDE1 (iron deficiency-responsive element 1), a known grass -Fe response CRE. Arabidopsis IDE1-likes were associated with FIT-dependent gene expression, more specifically with biosynthesis of Fe-chelating compounds. Thus, IDE1 seems to be conserved in grass and nongrass species. Our pCREs matched among others in vitro binding sites of B3, NAC, bZIP, and TCP TFs, which might be regulators of -Fe responses. Altogether, our findings provide a comprehensive source of cis-regulatory information for -Fe-responsive genes that advance our mechanistic understanding and inform future efforts in engineering plants with more efficient Fe uptake or transport systems. More than 100 putative cis-regulatory elements robustly predict Arabidopsis root iron deficiency responses in computational models and shed light on the mechanisms of transcriptional regulation
Expression and regulatory asymmetry of retained Arabidopsis thaliana transcription factor genes derived from whole genome duplication
Abstract Background Transcription factors (TFs) play a key role in regulating plant development and response to environmental stimuli. While most genes revert to single copy after whole genome duplication (WGD) event, transcription factors are retained at a significantly higher rate. Little is known about how TF duplicates have diverged in their expression and regulation, the answer to which may contribute to a better understanding of the elevated retention rate among TFs. Results Here we assessed what features may explain differences in the retention of TF duplicates and other genes using Arabidopsis thaliana as a model. We integrated 34 expression, sequence, and conservation features to build a linear model for predicting the extent of duplicate retention following WGD events among TFs and 19 groups of genes with other functions. We found that TFs was the least well predicted, demonstrating the features of TFs are substantially deviated from duplicate genes in other function groups. Consistent with this, the evolution of TF expression patterns and cis-regulatory cites favors the partitioning of ancestral states among the resulting duplicates: one âancestralâ TF duplicate retains most ancestral expression and cis-regulatory sites, while the ânon-ancestralâ duplicate is enriched for novel regulatory sites. By modeling the retention of ancestral expression and cis-regulatory states in duplicate pairs using a system of differential equations, we found that TF duplicate pairs in a partitioned state are preferentially maintained. Conclusions These TF duplicates with asymmetrically partitioned ancestral states are likely maintained because one copy retains ancestral functions while the other, at least in some cases, acquires novel cis-regulatory sites that may be important for novel, adaptive traits
Genetic models reveal historical patterns of sea lamprey population fluctuations within Lake Champlain
The origin of sea lamprey (Petromyzon marinus) in Lake Champlain has been heavily debated over the past decade. Given the lack of historical documentation, two competing hypotheses have emerged in the literature. First, it has been argued that the relatively recent population size increase and concomitant rise in wounding rates on prey populations are indicative of an invasive population that entered the lake through the Champlain Canal. Second, recent genetic evidence suggests a post-glacial colonization at the end of the Pleistocene, approximately 11,000 years ago. One limitation to resolving the origin of sea lamprey in Lake Champlain is a lack of historical and current measures of population size. In this study, the issue of population size was explicitly addressed using nuclear (nDNA) and mitochondrial DNA (mtDNA) markers to estimate historical demography with genetic models. Haplotype network analysis, mismatch analysis, and summary statistics based on mtDNA noncoding sequences for NCI (479 bp) and NCII (173 bp) all indicate a recent population expansion. Coalescent models based on mtDNA and nDNA identified two potential demographic events: a population decline followed by a very recent population expansion. The decline in effective population size may correlate with land-use and fishing pressure changes post-European settlement, while the recent expansion may be associated with the implementation of the salmonid stocking program in the 1970s. These results are most consistent with the hypothesis that sea lamprey are native to Lake Champlain; however, the credibility intervals around parameter estimates demonstrate that there is uncertainty regarding the magnitude and timing of past demographic events