2,523 research outputs found

    Gene regulatory networks: a coarse-grained, equation-free approach to multiscale computation

    Get PDF
    We present computer-assisted methods for analyzing stochastic models of gene regulatory networks. The main idea that underlies this equation-free analysis is the design and execution of appropriately-initialized short bursts of stochastic simulations; the results of these are processed to estimate coarse-grained quantities of interest, such as mesoscopic transport coefficients. In particular, using a simple model of a genetic toggle switch, we illustrate the computation of an effective free energy and of a state-dependent effective diffusion coefficient that characterize an unavailable effective Fokker-Planck equation. Additionally we illustrate the linking of equation-free techniques with continuation methods for performing a form of stochastic "bifurcation analysis"; estimation of mean switching times in the case of a bistable switch is also implemented in this equation-free context. The accuracy of our methods is tested by direct comparison with long-time stochastic simulations. This type of equation-free analysis appears to be a promising approach to computing features of the long-time, coarse-grained behavior of certain classes of complex stochastic models of gene regulatory networks, circumventing the need for long Monte Carlo simulations.Comment: 33 pages, submitted to The Journal of Chemical Physic

    Dynamic surrogate modelling for multistep-ahead prediction of multivariate nonlinear chemical processes

    Get PDF
    This work proposes a methodology for multivariate dynamic modeling and multistep-ahead prediction of nonlinear systems using surrogate models for the application to nonlinear chemical processes. The methodology provides a systematic and robust procedure for the development of data-driven dynamic models capable of predicting the process outputs over long time horizons. It is based on using surrogate models to construct several nonlinear autoregressive exogenous models (NARX) with each one approximating the future behavior of one process output as a function of the current and previous process inputs and outputs. The developed dynamic models are employed in a recursive schema to predict the process future outputs over several time steps (multistep-ahead prediction). The methodology is able to manage two different scenarios: (1) one in which a set of input–output signals collected from the process is only available for training and (2) another in which a mathematical model of the process is available and can be used to generate specific datasets for training. With respect to the latter, the proposed methodology includes a specific procedure for the selection of training data in dynamic modeling based on design of computer experiment (DOCE) techniques. The proposed methodology is applied to case studies from the process industry presented in the literature. The results show very high prediction accuracies over long time horizons. Also, owing to the flexibility, robustness, and computational efficiency of surrogate modeling, the methodology allows dealing with a wide range of situations, which would be difficult to address using first-principles models.Peer ReviewedPostprint (author's final draft

    Recipes for calibration and validation of agent-based models in cancer biomedicine

    Full text link
    Computational models and simulations are not just appealing because of their intrinsic characteristics across spatiotemporal scales, scalability, and predictive power, but also because the set of problems in cancer biomedicine that can be addressed computationally exceeds the set of those amenable to analytical solutions. Agent-based models and simulations are especially interesting candidates among computational modelling strategies in cancer research due to their capabilities to replicate realistic local and global interaction dynamics at a convenient and relevant scale. Yet, the absence of methods to validate the consistency of the results across scales can hinder adoption by turning fine-tuned models into black boxes. This review compiles relevant literature to explore strategies to leverage high-fidelity simulations of multi-scale, or multi-level, cancer models with a focus on validation approached as simulation calibration. We argue that simulation calibration goes beyond parameter optimization by embedding informative priors to generate plausible parameter configurations across multiple dimensions

    Application of machine learning in systems biology

    Get PDF
    Biological systems are composed of a large number of molecular components. Understanding their behavior as a result of the interactions between the individual components is one of the aims of systems biology. Computational modelling is a powerful tool commonly used in systems biology, which relies on mathematical models that capture the properties and interactions between molecular components to simulate the behavior of the whole system. However, in many biological systems, it becomes challenging to build reliable mathematical models due to the complexity and the poor understanding of the underlying mechanisms. With the breakthrough in big data technologies in biology, data-driven machine learning (ML) approaches offer a promising complement to traditional theory-based models in systems biology. Firstly, ML can be used to model the systems in which the relationships between the components and the system are too complex to be modelled with theory-based models. Two such examples of using ML to resolve the genotype-phenotype relationships are presented in this thesis: (i) predicting yeast phenotypes using genomic features and (ii) predicting the thermal niche of microorganisms based on the proteome features. Secondly, ML naturally complements theory-based models. By applying ML, I improved the performance of the genome-scale metabolic model in describing yeast thermotolerance. In this application, ML was used to estimate the thermal parameters by using a Bayesian statistical learning approach that trains regression models and performs uncertainty quantification and reduction. The predicted bottleneck genes were further validated by experiments in improving yeast thermotolerance. In such applications, regression models are frequently used, and their performance relies on many factors, including but not limited to feature engineering and quality of response values. Manually engineering sufficient relevant features is particularly challenging in biology due to the lack of knowledge in certain areas. With the increasing volume of big data, deep-transfer learning enables us to learn a statistical summary of the samples from a big dataset which can be used as input to train other ML models. In the present thesis, I applied this approach to first learn a deep representation of enzyme thermal adaptation and then use it for the development of regression models for predicting enzyme optimal and protein melting temperatures. It was demonstrated that the transfer learning-based regression models outperform the classical ones trained on rationally engineered features in both cases. On the other hand, noisy response values are very common in biological datasets due to the variation in experimental measurements and they fundamentally restrict the performance attainable with regression models. I thereby addressed this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends on the noise associated with the response variable and variance for a given dataset. It can thus be used to test whether the maximal performance has been reached on a particular dataset, or whether further model improvement is possible

    Role of Proteome Physical Chemistry in Cell Behavior.

    Get PDF
    We review how major cell behaviors, such as bacterial growth laws, are derived from the physical chemistry of the cell's proteins. On one hand, cell actions depend on the individual biological functionalities of their many genes and proteins. On the other hand, the common physics among proteins can be as important as the unique biology that distinguishes them. For example, bacterial growth rates depend strongly on temperature. This dependence can be explained by the folding stabilities across a cell's proteome. Such modeling explains how thermophilic and mesophilic organisms differ, and how oxidative damage of highly charged proteins can lead to unfolding and aggregation in aging cells. Cells have characteristic time scales. For example, E. coli can duplicate as fast as 2-3 times per hour. These time scales can be explained by protein dynamics (the rates of synthesis and degradation, folding, and diffusional transport). It rationalizes how bacterial growth is slowed down by added salt. In the same way that the behaviors of inanimate materials can be expressed in terms of the statistical distributions of atoms and molecules, some cell behaviors can be expressed in terms of distributions of protein properties, giving insights into the microscopic basis of growth laws in simple cells

    Development of software sensors for on-line monitoring of bakers yeast fermentation process

    Get PDF
    Software sensors and bioprocess are well-established research areas which have much to offer each other. Under the perspective of the software sensors area, bioprocess can be considered as a broad application area with a growing number of complex and challenging tasks to be dealt with, whose solutions can contribute to achieving high productivity and high-quality products. Although throughout the past years in the field of software sensors and bioprocess, progress has been quick and with a high degree of success, there is still a lack of inexpensive and reliable sensors for on-line state and parameter estimation. Therefore, the primary objective of this research was to design an inexpensive measurement system for on-line monitoring of ethanol production during the backers yeast cultivation process. The measurement system is based on commercially available metal oxide semiconductor gas sensors. From the bioreactor headspace, samples are pumped past the gas sensors array for 10 s every five minutes and the voltage changes of the sensors are measured. The signals from the gas sensor array showed a high correlation with ethanol concentration during cultivation process. In order to predict ethanol concentrations from the data of the gas sensor array, a principal component regression (PCR) model was developed. For the calibration procedure no off-line sampling was used. Instead, a theoretical model of the process is applied to simulate the ethanol production at any given time. The simulated ethanol concentrations were used as reference data for calibrating the response of the gas sensor array. The obtained results indicate that the model-based calibrated gas sensor array is able to predict ethanol concentrations during the cultivation process with a high accuracy (root mean square error of calibration as well as the percentage error for the validation sets were below 0.2 gL-1 and 7 %, respectively). However the predicted values are only available every five minutes. Therefore, the following plan of the research goal was to implement an estimation method for continues prediction of ethanol as well as glucose, biomass and the growth rates. For this reason, two nonlinear extensions of the Kalman filter namely the extended Kalman filter (EKF) and the unscented Kalman filter (UKF) were implemented separately for state and parameter estimation. Both prediction methods were validated on three different cultivation with variability of the substrate concentrations. The obtained results showed that both estimation algorithms show satisfactory results with respect to estimation of concentrations of substrates 6 and biomass as well as the growth rate parameters during the cultivation. However, despite the easier implementation producer of the UKF, this method shows more accurate prediction results compared to the EKF prediction method. Another focus of this study was to design and implement an on-line monitoring and control system for the volume evaluation of dough pieces during the proofing process of bread making. For this reason, a software sensor based on image processing was designed and implemented for measuring the dough volume. The control system consists of a fuzzy logic controller which takes into account the estimated volume. The controller is designed to maintain the volume of the dough pieces similar to the volume expansion of a dough piece in standard conditions during the proofing process by manipulating the temperature of the proofing chamber. Dough pieces with different amounts of backers yeast added in the ingredients and in different temperature starting states were prepared and proofed with the supervision of the software sensor and the fuzzy controller. The controller was evaluated by means of performance criteria and the final volume of the dough samples. The obtained results indicate that the performance of the system is very satisfactory with respect to volume control and set point deviation of the dough pieces.Softwaresensoren und Bioprozese sind gut etablierte Forschungsgebiete, die sich gegenseitig viel befruchten können. Unter dem Blickwinkel der Softwaresensorik kann der Bioprozess als ein breites Anwendungsgebiet mit einer wachsenden Zahl komplexer und anspruchsvoller Aufgabenstellungen betrachtet werden, deren Lösung zur Erzielung hoher Produktivität und qualitativ hochwertiger Produkte beitragen kann. Obwohl in den letzten Jahren auf dem Gebiet der Softwaresensoren und des Bioprozesses rasch und mit großem Erfolg Untersuchung erzielt wurden, fehlt es immer noch an kostengünstigen und zuverlässigen Sensoren für die Online-Zustands- und Parameterschätzung. Daher war das primäre Ziel dieser Forschung die Entwicklung eines kostengünstigen Messsystems für die Online-Überwachung der Ethanolproduktion während des Kultivierungsprozesses von Backhefe. Das Messsystem basiert auf kommerziell erhältlichen Metalloxid-Halbleiter-Gassensoren. Die Headspace-Proben des Bioreaktors werden alle fünf Minuten für 10 s an der Gassensor-Anordnung vorbeigepumpt und die Spannungsänderungen der Sensoren werden gemessen. Die Signale des Gassensorarrays zeigten eine hohe Korrelation mit der Ethanolkonzentration während des Kultivierungsprozesses. Um die Ethanolkonzentrationen aus den Daten des Gassensorarrays vorherzusagen, wurde ein Hauptkomponenten-Regressionsmodell (PCR) verwendet. Für das Kalibrierungsverfahren ist keine Offline-Probenahme notwendig. Stattdessen wird ein theoretisches Modell des Prozesses genutzt, um die Ethanolproduktion zu jedem beliebigen Zeitpunkt zu simulieren. Die kinetischen Parameter des Modells werden im Rahmen der Kalibration bestimmt. Die simulierten Ethanolkonzentrationen wurden als Referenzdaten für die Kalibrierung des Ansprechverhaltens des Gassensorarrays verwendet. Die erhaltenen Ergebnisse zeigen, dass das modellbasierte kalibrierte Gassensorarray in der Lage ist, die Ethanolkonzentrationen während des Kultivierungsprozesses mit hoher Genauigkeit vorherzusagen (der mittlere quadratische Fehler der Kalibrierung sowie der prozentuale Fehler für die Validierungssätze lagen unter 0,2 gL-1 bzw. 7 %). Die vorhergesagten Werte sind jedoch nur alle fünf Minuten verfügbar. Daher war der folgende Plan der Untersuchung die Implementierung einer Schätzmethode zur kontinuierlichen Vorhersage von Ethanol sowie von Glukose, Biomasse und der Wachstumsrate. Aus diesem Grund wurden zwei nichtlineare Erweiterungen des Kalman Filters, nämlich der erweiterte Kalman Filter (EKF) und der unscented Kalman Filter (UKF), getrennt für die Zustands und Parameterschätzung implementiert. Beide 8 Vorhersagemethoden wurden an drei verschiedenen Kultivierungen mit Variabilität der Start substratkonzentrationen validiert. Die erhaltenen Ergebnisse zeigen, dass beide Schätzungsalgorithmen zufriedenstellende Ergebnisse hinsichtlich der Schätzung der Konzentrationen von Substraten und Biomasse sowie der Parameter der Wachstumsrate während der Kultivierung ermitteln. Trotz der einfacheren Implementierung des UKF zeigt diese Methode jedoch genauere Vorhersageergebnisse im Vergleich zur EKF-Vorhersagemethode. Ein weiterer Schwerpunkt dieser Untersuchung war der Entwurf und die Implementierung eines Online-Überwachungs- und Regelungssystems für die Volumenauswertung von Teigstücken während des Gärprozesses bei der Brotherstellung. Aus diesem Grund wurde ein auf Bildverarbeitung basierendes Überwachungssystem zur Messung der Teigvolumenauswertung entworfen und implementiert. Das Regelsystem besteht aus einem Fuzzy-Logic-Controller, der das gemessene Volumen für die Regelung nutzt. Die Regelung ist so ausgelegt, dass das Volumen der Teiglinge mit Werten des Volumens eines Teiglings unter Standardbedingungen während des Gärprozesses vergleicht und die Temperatur der Gärkammer entsprechend anpasst. Teiglinge mit unterschiedlichen Hefemengen in den Zutaten und verschiedenen Temperaturstartwerten wurden vorbereitet und unter Anwendung des Fuzzy-Reglers gegärt. Der Regler wurde anhand von Leistungskriterien und dem Endvolumen der Teigproben bewertet. Die erhaltenen Ergebnisse zeigen, dass die Leistung des Systems in Bezug auf die Volumenregelung und die Sollwertabweichung der Teigstücke sehr zufriedenstellend ist

    Epigenetic Reprogramming, Apoptosis, and Developmental Competence in Cloned Embryos

    Get PDF
    Cloning through somatic cell nuclear transfer (SCNT) remains highly inefficient twenty years after the first demonstration of the technology with the birth of Dolly. By increasing efficiency by selecting the embryos early in development that are most likely to succeed following transfer into a surrogate mother, the technology could be more routinely utilized to enhance animal agriculture production. SCNT is believed to be highly inefficient as a result of incorrect DNA methylation and gene expression that are accumulated because of the SCNT technique. We proposed the use of a non-toxic, non-invasive detector of cell death, to quantitatively assess embryo competency prior to embryo transfer. We believed we could use SR-FLICA to identify the embryos with low levels of cell death as a result of proper DNA methylation and gene expression. By analyzing the whole embryo, differences in gene expression and DNA methylation were identified in embryos with high and low levels of cell death. However, the level of cell death did not prove to be a reliable indicator of embryo quality in predicting pregnancy outcome. This data supports the commonly held hypothesis that DNA methylation and gene expression after SCNT have random defects as a result of the random nature of resetting the DNA for embryo development. More research is required to identify the embryos which will prove to be successful following SCNT and embryo transfer

    Data-assisted modeling of complex chemical and biological systems

    Get PDF
    Complex systems are abundant in chemistry and biology; they can be multiscale, possibly high-dimensional or stochastic, with nonlinear dynamics and interacting components. It is often nontrivial (and sometimes impossible), to determine and study the macroscopic quantities of interest and the equations they obey. One can only (judiciously or randomly) probe the system, gather observations and study trends. In this thesis, Machine Learning is used as a complement to traditional modeling and numerical methods to enable data-assisted (or data-driven) dynamical systems. As case studies, three complex systems are sourced from diverse fields: The first one is a high-dimensional computational neuroscience model of the Suprachiasmatic Nucleus of the human brain, where bifurcation analysis is performed by simply probing the system. Then, manifold learning is employed to discover a latent space of neuronal heterogeneity. Second, Machine Learning surrogate models are used to optimize dynamically operated catalytic reactors. An algorithmic pipeline is presented through which it is possible to program catalysts with active learning. Third, Machine Learning is employed to extract laws of Partial Differential Equations describing bacterial Chemotaxis. It is demonstrated how Machine Learning manages to capture the rules of bacterial motility in the macroscopic level, starting from diverse data sources (including real-world experimental data). More importantly, a framework is constructed though which already existing, partial knowledge of the system can be exploited. These applications showcase how Machine Learning can be used synergistically with traditional simulations in different scenarios: (i) Equations are available but the overall system is so high-dimensional that efficiency and explainability suffer, (ii) Equations are available but lead to highly nonlinear black-box responses, (iii) Only data are available (of varying source and quality) and equations need to be discovered. For such data-assisted dynamical systems, we can perform fundamental tasks, such as integration, steady-state location, continuation and optimization. This work aims to unify traditional scientific computing and Machine Learning, in an efficient, data-economical, generalizable way, where both the physical system and the algorithm matter
    corecore