29 research outputs found

    Improved Evidence-based Genome-scale Metabolic Models for Maize Leaf, Embryo, and Endosperm

    Get PDF
    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes

    High-throughput Comparison, Functional Annotation, and Metabolic Modeling of Plant Genomes using the PlantSEED Resource

    Full text link
    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes

    KBase: The United States Department of Energy Systems Biology Knowledgebase.

    Get PDF

    Model selection.

    No full text
    <p>We consider logistic models with different number of pathways <i>P</i> and of pairs of pathways <i>z</i> (see text and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002762#s4" target="_blank">Methods</a>). <b>A</b>, Model accuracy. We calculate the true positive (<i>TP</i>) and true negative (<i>TN</i>) rates for the different models. <i>TP</i> reflects whether the model correctly predicts <i>G</i> nutrients, whilst the <i>TN</i> reflects whether the model correctly predicts <i>NG</i> nutrients. <b>B</b>, Area under the ROC curve (<i>AUC</i>) for the 10 models. The higher the <i>AUC</i>, the better the model is at separating <i>G</i> nutrients from <i>NG</i> nutrients. <b>C</b>, Akaike information criterion (AIC) and Bayesian information criterion (BIC) of the 10 models. The lower the information criterion, the more parsimonious the model. We could not identify any additional pathways and/or pathway pairs that improved the AIC and BIC of the model with <i>P</i> = 8, <i>z</i> = 2 (pathways and pathway pairs are listed in the upper right panel of the figure). In the case of <i>TP</i>, <i>TN</i>, and <i>AUC</i>, we apply our complete model including both nutrient classes and KEGG pathways to the training set of organisms, and to the test set of organisms (see text and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002762#s4" target="_blank">Methods</a>). When <i>P</i> = 0, <i>z</i> = 0, there are more <i>NG</i> nutrients than <i>G</i> nutrients that are not included in a nutrient class, therefore all of these nutrients are considered <i>i</i>∈<i>NG</i>; hence, the initially low <i>TN</i> rate. When <i>P</i>≄4, the <i>TP</i> in the test set is similar to the <i>TP</i> in the training set. This means that our model is successful at identifying <i>G</i> nutrients. However, the <i>TN</i> for the test set is slightly lower than the TN for the training set. This occurs because there are more <i>NG</i> nutrients in the test set that are also found in the Sugar and Sugar derivative classes, or in <i>G</i> pathways in the linear model, which we could not account for because of the small sample size of the training set. The difference between the <i>TN</i> rates of the two test sets has an impact on the overall accuracy of the model for the training and test sets.</p

    Schematic representation of the development of a model for maximum biomass production in complex media of microbial organisms.

    No full text
    <p>We aim at developing a phenomenological model to predict the maximum biomass production <i>B</i><sub>m</sub> of species <i>s</i> when growing in a medium containing a set of nutrients {<i>i</i>} acting as a carbon source under aerobic conditions. That is, we want to express <i>B</i><sub>m</sub> as a function <i>f</i> ({<i>i</i>}, <i>s</i>) that only takes into account data related to: i) the set of nutrients {<i>i</i>} available, namely, nutrient type, the set of pathways {<i>p</i>(<i>i</i>)} that can catabolize each nutrient, and the carbon content C<i><sub>i</sub></i> of each nutrient; and ii) the species <i>s</i>, specifically the presence or not of certain enzymes in a species that allow to catabolize specific types of nutrients (enzymes EC: 1.1.1.35, EC: 2.3.1.16, and EC 3.5.2.17 - see text). In order to achieve our goal, there are three different questions we need to answer: i) Does nutrient <i>i</i> produce growth or not in species <i>s</i> when acting as the sole source of carbon? We find that whether nutrient <i>i</i> produces growth (G) or not (NG) is a function of the nutrient type (see text) and its pathway membership; (ii) If a nutrient produces growth, what is the maximal biomass it can produce in species <i>s</i> when acting as the sole source of carbon? We find that is proportional to <i>C<sub>i</sub></i>, the number of carbons in nutrient <i>i</i>, and that the proportionality constant <i>y</i><sub>s</sub> depends on the species <i>s</i>. iii) What is the maximal biomass production <i>B</i><sub>m</sub>(<i>m</i>) when growing on a complex medium <i>m</i>? We find that <i>B</i><sub>m</sub>(<i>m</i>) can be well approximated by adding up the individual contributions of nutrients <i>i</i> present in medium <i>m</i>.</p

    Adjusting for the effective number of carbons in complex nutrients and purines in the training set.

    No full text
    <p>We show the normalized biomass yield (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002762#pcbi-1002762-g007" target="_blank">Fig. 7</a>) for purines and complex nutrients (full colored symbols) for species in the training set. In the left column, we show the normalized biomass yield considering the number of carbons in each nutrient <i>C<sub>i</sub></i>. In the right column, we show the normalized biomass yield using the effective number of carbons (see Text and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002762#s4" target="_blank">Methods</a>). Additionally, for each nutrient class that contains these nutrients, we show the mean and variance.</p

    Biomass production is related to the number of carbons in a nutrient.

    No full text
    <p>We show the optimized biomass production of each species on <i>G</i> nutrients, for species in the training set (left) and the test set (right). For all species there is a positive correlation between biomass production and the number of carbons in the nutrient. The blue line represents (see text) for all the sugars uptaken by species <i>s</i>. <i>S. aureus</i> exhibits a reduced biomass production; the biomass defined in the <i>in silico</i> organisms demands approximately ten times more moles relative to the other species. In all the plots, the position of the nutrients on the X axis is slightly staggered so that all data points are visible. Note that the symbols for the complex nutrients are enlarged.</p

    Predictions for four organisms lacking a metabolic reconstruction.

    No full text
    <p><b>A</b>, The number of nutrients found to be uptaken by four organisms for which we lack a metabolic reconstruction: <i>Rhodopseudomonas palustris</i> (gram-negative bacterium), <i>Listeria monocytogenes</i> (gram-positive bacterium), <i>Dictyostelium discoideum</i> (eukaryote), and <i>Thermoplasma acidophilum</i> (archaeon). The nutrients were determined using predictions found in TransportDB (<a href="http://www.transportdb.org" target="_blank">http://www.transportdb.org</a>). <b>B</b>, Prediction of whether a nutrient is a source of carbon according to class. Bars in the top panel represent predictions of <i>G</i> nutrients, whereas bars in the bottom panel represent predictions of <i>NG</i> nutrients. None of the species had fatty acids listed as nutrients, but since fatty acids can be uptaken by diffusing through the cell membrane, we show here the predictions for fatty acids as well. The prediction for nutrients in the Organic compounds class are based on our logistic regression using the KEGG pathway terms. Thus some nutrients are predicted to be <i>G</i> while others are predicted to be <i>NG</i>.</p
    corecore