21 research outputs found

    Revelation of Yin-Yang Balance in Microbial Cell Factories by Data Mining, Flux Modeling, and Metabolic Engineering

    Get PDF
    The long-held assumption of never-ending rapid growth in biotechnology and especially in synthetic biology has been recently questioned, due to lack of substantial return of investment. One of the main reasons for failures in synthetic biology and metabolic engineering is the metabolic burdens that result in resource losses. Metabolic burden is defined as the portion of a host cells resources either energy molecules (e.g., NADH, NADPH and ATP) or carbon building blocks (e.g., amino acids) that is used to maintain the engineered components (e.g., pathways). As a result, the effectiveness of synthetic biology tools heavily dependents on cell capability to carry on the metabolic burden. Although genetic modifications can effectively engineer cells and redirect carbon fluxes toward diverse products, insufficient cell ATP powerhouse is limited to support diverse microbial activities including product synthesis. Here, I employ an ancient Chinese philosophy (Yin-Yang) to describe two contrary forces that are interconnected and interdependent, where Yin represents energy metabolism in the form of ATP, and Yang represents carbon metabolism. To decipher Yin-Yang balance and its implication to microbial cell factories, this dissertation applied metabolic engineering, flux analysis, data mining tools to reveal cell physiological responses under different genetic and environmental conditions. Firstly, a combined approach of FBA and 13C-MFA was employed to investigate several engineered isobutanol-producing strains and examine their carbon and energy metabolism. The result indicated isobutanol overproduction strongly competed for biomass building blocks and thus the addition of nutrients (yeast extract) to support cell growth is essential for high yield of isobutanol. Based on the analysis of isobutanol production, \u27Yin-Yang\u27 theory has been proposed to illustrate the importance of carbon and energy balance in engineered strains. The effects of metabolic burden and respiration efficiency (P/O ratio) on biofuel product were determined by FBA simulation. The discovery of energy cliff explained failures in bioprocess scale-ups. The simulation also predicted that fatty acid production is more sensitive to P/O ratio change than alcohol production. Based on that prediction, fatty acid producing strains have been engineered with the insertion of Vitreoscilla hemoglobin (VHb), to overcome the intracellular energy limitation by improving its oxygen uptake and respiration efficiency. The result confirmed our hypothesis and different level of trade-off between the burden and the benefit from various introduced genetic components. On the other side, a series of computational tools have been developed to accelerate the application of fluxomics research. Microbesflux has been rebuilt, upgraded, and moved to a commercial server. A platform for fluxomics study as well as an open source 13C-MFA tool (WUFlux) has been developed. Further, a computational platform that integrates machine learning, logic programming, and constrained programming together has been developed. This platform gives fast predictions of microbial central metabolism with decent accuracy. Lastly, a framework has been built to integrate Big Data technology and text mining to interpret concepts and technology trends based on the literature survey. Case studies have been performed, and informative results have been obtained through this Big Data framework within five minutes. In summary, 13C-MFA and flux balance analysis are only tools to quantify cell energy and carbon metabolism (i.e., Yin-Yang Balance), leading to the rational design of robust high-producing microbial cell factories. Developing advanced computational tools will facilitate the application of fluxomics research and literature analysis

    Decoding Complexity in Metabolic Networks using Integrated Mechanistic and Machine Learning Approaches

    Get PDF
    How can we get living cells to do what we want? What do they actually ‘want’? What ‘rules’ do they observe? How can we better understand and manipulate them? Answers to fundamental research questions like these are critical to overcoming bottlenecks in metabolic engineering and optimizing heterologous pathways for synthetic biology applications. Unfortunately, biological systems are too complex to be completely described by physicochemical modeling alone. In this research, I developed and applied integrated mechanistic and data-driven frameworks to help uncover the mysteries of cellular regulation and control. These tools provide a computational framework for seeking answers to pertinent biological questions. Four major tasks were accomplished. First, I developed innovative tools for key areas in the genome-to-phenome mapping pipeline. An efficient gap filling algorithm (called BoostGAPFILL) that integrates mechanistic and machine learning techniques was developed for the refinement of genome-scale metabolic network reconstructions. Genome-scale metabolic network reconstructions are finding ever increasing applications in metabolic engineering for industrial, medical and environmental purposes. Second, I designed a thermodynamics-based framework (called REMEP) for mutant phenotype prediction (integrating metabolomics, fluxomics and thermodynamics data). These tools will go a long way in improving the fidelity of model predictions of microbial cell factories. Third, I designed a data-driven framework for characterizing and predicting the effectiveness of metabolic engineering strategies. This involved building a knowledgebase of historical microbial cell factory performance from published literature. Advanced machine learning concepts, such as ensemble learning and data augmentation, were employed in combination with standard mechanistic models to develop a predictive platform for important industrial biotechnology metrics such as yield, titer, and productivity. Fourth, my modeling tools and skills have been used for case studies on fungal lipid metabolism analyses, E. coli resource allocation balances, reconstruction of the genome-scale metabolic network for a non-model species, R. opacus, as well as the rapid prediction of bacterial heterotrophic fluxomics. In the long run, this integrated modeling approach will significantly shorten the “design-build-test-learn” cycle of metabolic engineering, as well as provide a platform for biological discovery

    Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.

    Get PDF
    Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics

    Machine and deep learning meet genome-scale metabolic modeling

    Get PDF
    Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process

    The era of big data: Genome-scale modelling meets machine learning

    Get PDF
    With omics data being generated at an unprecedented rate, genome-scale modelling has become pivotal in its organisation and analysis. However, machine learning methods have been gaining ground in cases where knowledge is insufficient to represent the mechanisms underlying such data or as a means for data curation prior to attempting mechanistic modelling. We discuss the latest advances in genome-scale modelling and the development of optimisation algorithms for network and error reduction, intracellular constraining and applications to strain design. We further review applications of supervised and unsupervised machine learning methods to omics datasets from microbial and mammalian cell systems and present efforts to harness the potential of both modelling approaches through hybrid modelling

    Machine learning in bioprocess development: From promise to practice

    Get PDF
    Fostered by novel analytical techniques, digitalization and automation, modern bioprocess development provides high amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have a high potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. The aim of this review is to demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges and point out domains that can potentially benefit from technology transfer and further progress in the field of ML

    Application of metabolic modeling and machine learning for investigating microbial systems

    Get PDF
    Metabolic modeling is an important tool to interpret the comprehensive cell metabolism and dynamic relationship between substrates and biomass/bioproducts. Genome-scale flux balance model and 13C-metabolic flux analysis are metabolic models which can reveal the theoretical yield and central carbon metabolism under various environmental conditions. Kinetic model is able to capture the complex principles between the change of biomass growth and bioproducts accumulation with the time series. Machine learning model is a data driven approach to reveal fermentation behavior and further predict cell performance under complex circumstances. In my PhD study, modeling analysis and machine learning method have been used to exam non-conventional microbial systems. (1) decode the functional pathway and carbon flux distribution in Cyanobacteria and Clostridium species for bio productions, (2) characterize biofilm physiologies and biodiesel fermentations (engineered E.coli) under mass transfer limitations, and (3) optimize syngas fermentations by deciphering and overcoming rate limiting process factor

    A Pipeline to Generate Deep Learning Surrogates of Genome-Scale Metabolic Models

    Get PDF
    Genome-Scale Metabolic Models (GEMMs) are powerful reconstructions of biological systems that help metabolic engineers understand and predict growth conditions subjected to various environmental factors around the cellular metabolism of an organism in observation, purely in silico. Applications of metabolic engineering range from perturbation analysis and drug-target discovery to predicting growth rates of biotechnologically important metabolites and reaction objectives within dierent single-cell and multi-cellular organism types. GEMMs use mathematical frameworks for quantitative estimations of flux distributions within metabolic networks. The reasons behind why an organism activates, stuns, or fluctuates between alternative pathways for growth and survival, however, remain relatively unknown. GEMMs rely on manual intervention during their curation and annotation process, which can potentially induce substantial experimental bias. Also, solution spaces that cater to the flux distributions can be sensitive to the addition, updates, and deletions of metabolites and reactions and gene-enzyme-reaction rules within the model. Therefore, the quest for optimality can often be lost due to the number of hyper dimensions represented by these networks Recently, Deep Learning (DL) has played a significant role in building function approximators for highly complex input datasets correlating in extremely large hyper dimensions. In this thesis, to address the computational costs associated with the simulations of GEMMs, we use an interpretable learning-driven approach to build surrogate GEMM models that act as alternatives to existent Flux Balance Analysis (FBA)-based approaches for predicting intracellular fluxes of reactions. We exploit the network characteristics of a well-curated input organism and build a synthetic subset of the flux cone containing thermodynamically feasible reaction growth rates. We then feed this dataset into a deep generative model capable of reconstructing intracellular flux values of the input organism. We evaluate its efficiency based on time-to-construct, accuracy, and ease of use. To provide a fair comparative analysis, we explore our learning approach with other traditional regression-based models and test our pipeline on three different input organisms subjected to network reduction techniques and different hyperparameters. Advisers: TomĂĄĆĄ Helikar & Massimiliano Pierobo
    corecore