Over the last 15 years, several genome-scale metabolic models (GSSMs) of
Saccharomyces cerevisiae were reconstructed and published. The in silico representation of
the interaction network between all the system components is carried out to predict the
physiological behavior of a microorganism, under different environmental and genetic
perturbations. However, gene knockout predictions are usually assessed and validated
using merely gene essentiality data. Saccharomyces Genome Database (SGD) [1] is a
powerful web-accessible resource that comprises functional structured information of
budding yeast genes. SGD contains information about over 180 different observed types of
phenotypes of which nearly 10% can be predicted using GSMMs. These data can provide
an additional layer for curation and validation of metabolic models, as well as contribute to
model improvements and to gain insights into yeast physiology.
In this study we have assessed the predictive accuracy of GSSMs based on singlegene
deletions, by comparing experimental data present in SGD with computational
simulations. Since the phenotypical behavior upon a gene deletion depends on the strain
background, media and other environmental conditions, we performed a thoroughly
characterization and (re)curation of the in vivo experiments to closely mimic these
evidences in silico. Nearly 3000 different phenotypic reported cases were evaluated using
two different constraint-based approaches (pFBA [2] and LMOMA [3]), which allow a
direct association between genetic data and metabolic fluxes. In parallel, a Jupyter
Notebook platform was also developed, aiming to serve as a possible validation tool for
new yeast GSMMs, using the curated SGD-based dataset.
We observed that, despite all the recent efforts and advances in the reconstruction
and annotation of GSMMs, there is still a lot of opportunities for improvements in the
models predictive ability. Most of the observed mismatches result from structural issues in
network reconstructions or due to the lack of regulatory information. To address these
issues, several strategies were investigated, including changes in gene-protein-reaction
associations and reversibility of reactions in the network, aside from the formulation of a
new biomass equation, based on the experimental determination of its macromolecular
composition, to which several cofactors, that surprisingly had not been represented in the
original biomass reaction, were also added. For example, this last modification led to
significant improvements in the prediction of auxotroph-inducing mutations and lethal
knockouts, which should enable us to more effectively engineer yeast as a cell factory