48 research outputs found
Novel strategies for process control based on hybrid semi-parametric mathematical systems
Tese de doutoramento. Engenharia Química. Universidade do Porto. Faculdade de Engenharia. 201
Boosting functional regression models
In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates.
In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor.
For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}.
The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates
From Metabolite Concentration to Flux – A Systematic Assessment of Error in Cell Culture Metabolomics
The growing availability of genomic, transcriptomic, and metabolomic data has opened the door to the synthesis of multiple levels of information in biological research. As a consequence, there has been a push to analyze biological systems in a comprehensive manner through the integration of their interactions into mathematical models, with the process frequently referred to as “systems biology”. Despite the potential for this approach to greatly improve our knowledge of biological systems, the definition of mathematical relationships between different levels of information opens the door to diverse sources of error, requiring precise, unbiased quantification as well as robust validation methods. Failure to account for differences in uncertainty across multiple levels of data analysis may cause errors to drown out any useful outcomes of the synthesis. The application of a systems biology approach has been particularly important in metabolic modeling. There has been a concentrated effort to build models directly from genomic data and to incorporate as much of the metabolome as possible in the analysis. Metabolomic data collection has been expanded through the recent use of hydrogen Nuclear Magnetic Resonance (1H-NMR) spectroscopy for cell culture monitoring. However, the combination of uncertainty from model construction and measurement error from NMR (or other means of metabolomic) analysis complicates data interpretation. This thesis establishes the precision and accuracy of NMR spectroscopy in the context of cell cultivation while developing a methodology for assessing model error in Metabolic Flux Analysis (MFA).
The analysis of cell culture media via NMR has been made possible by the development of specialized software for the “deconvolution” of complex spectra, however, the process is semi-qualitative. A human “profiler” is required to manually fit idealized peaks from a compound library to an observed spectra, where the quality of fit is often subject to considerable interpretation. Work presented in this thesis establishes baseline accuracy as approximately 2%-10% of the theoretical mean, with a relative standard deviation of 1.5% to 3%. Higher variabilities were associated primarily with profiling error, while lower
variabilities were due in part to tube insertion (and the steps leading up to spectra acquisition). Although a human profiler contributed to overall uncertainty, the net impact did not make the deconvolution process prohibitively imprecise. Analysis was then expanded to consider solutions that are more representative of cell culture supernatant. The combination of metabolites at different concentration levels was efficiently represented by a Plackett-Burman experiment. The orthogonality of this design ensured that every level of metabolite concentration was combined with an equal number of high and low concentrations of all other variable metabolites, providing a worst-case scenario for variance estimation. Analysis of media-like mixtures revealed a median error and standard deviation to be approximately 10%, although estimating low metabolite concentrations resulted
in a considerable loss of accuracy and precision in the presence of resonance overlap. Furthermore, an iterative regression process identified a number of cases where an increase in the concentration of one metabolite resulted in increased quantification error of another. More importantly, the analysis established a general methodology for estimating the quantification variability of media-specific metabolite concentrations.
Subsequent application of NMR analysis to time-course data from cell cultivation revealed correlated deviations from calculated trends. Similar deviations were observed for multiple (chemically) unrelated metabolites, amounting to approximately 1%-10% of the metabolite’s concentration. The nature of these deviations suggested the cause to be inaccuracies in internal standard addition or quantification, resulting in a skew of all quantified metabolite concentrations within a sample by the same relative amount. Error magnitude was estimated by calculating the median relative deviation from a smoothing fit for all compounds at a give timepoint. A metabolite time-course simulation was developed to determine the frequency and magnitude of such deviations arising from typical measurement error (without added bias from incorrect internal standard addition). Multiple smoothing functions were tested on simulated time-courses and cubic spline regression was found to minimize the median relative deviation from measurement noise to approximately 2.5%. Based on these results, an iterative smoothing correction method was implemented to identify and correct median deviations greater than 2.5%, with both simulation and correction
code released as the “metcourse” package for the R programming language.
Finally, a t-test validation method was developed to assess the impact of measurement and model error on MFA, with a Chinese hamster ovary (CHO) cell model chosen as a case study. The standard MFA formulation was recast as a generalized least squares (GLS) problem, with calculated fluxes subject to a t-significance test. NMR data was collected for a CHO cell bioreactor run, with another set of data simulated directly from the model and perturbed by observed measurement error. The frequency of rejected fluxes in the simulated data (free of model error) was attributed to measurement uncertainty alone. The rejection of fluxes calculated from observed data as non-significant that were not rejected in the simulated data was attributed to a lack of model fit i.e. model error. Applying this method to the observed data revealed a considerable level of error that was not identified by traditional χ2 validation. Further simulation was carried out to assess the impact of measurement error and model structure, both of which were found to have a dramatic impact on statistical significance and calculation error that has yet to be addressed in the context of MFA
Recommended from our members
Radio frequency additive manufacturing : a volumetric approach to polymer powder bed fusion
Polymer powder bed fusion (PBF) additive manufacturing offers a number of advantages over conventional manufacturing techniques, particularly in the areas of reduced tooling costs and the added geometric complexity available to designers. However, existing methods require heat to be applied to the powder bed at each layer to fuse the powders and form parts. The layer-wise heating strategies used in current PBF processes contribute to a reduction in the mechanical performance of the parts and increase the time required to fabricate them.
To address these issues, a volumetric heating strategy is implemented through a novel radio frequency additive manufacturing (RFAM) process. Radio frequency (RF) radiation is a heating mechanism that is capable of penetrating into materials to cause a simultaneous temperature rise throughout the material volume. Given the insulating nature of most polymers, electrically conductive dopants can be added to a polymer powder bed such that the effective composite properties are suitable for RF heating. By selectively patterning the dopant in the powder bed and applying RF radiation, heat generation can be contained to the RF-absorbing doped region with little effect to the surrounding powder bed. With powder mixtures of nylon 12 as the polymer and graphite as the dopant, it is possible to fuse the host polymer using RF radiation as the sole energy source.
One of the consequences of creating parts with this method is the complex interaction between the part geometry and the applied RF field that can cause non-uniform heating to develop within the part. Aided by computational design approaches, methods for improving the heating uniformity are proposed including a functional grading scheme to vary the dopant concentration throughout the powder bed. To validate the computational models and further develop the RFAM process, the design of a prototype machine capable of three dimensional dopant patterning is presented. The prototype system is used to create RFAM parts and evaluate the effectiveness of the different strategies aimed at improving the heating uniformity within the doped powder beds. As a result of this work, the feasibility of a volumetric, RF-assisted additive manufacturing process is demonstrated.Mechanical Engineerin
NASA Tech Briefs, January 1989
Topics include: Electronic Components & and Circuits. Electronic Systems, A Physical Sciences, Materials, Computer Programs, Mechanics, Machinery, Fabrication Technology, Mathematics and Information Sciences, and Life Sciences
Shape constrained splines as transparent black-box models for bioprocess modeling
Empirical model identification for biological systems is a challenging task due to the combined effects of complex interactions, nonlinear effects, and lack of specific measurements. In this context, several researchers have provided tools for experimental design, model structure selection, and optimal parameter estimation, often packaged together in iterative model identification schemes. Still, one often has to rely on a limited number of candidate rate laws such as Contois, Haldane, Monod, Moser, and Tessier. In this work, we propose to use shape-constrained spline functions as a way to reduce the number of candidate rate laws to be considered in a model identification study, while retaining or even expanding the explanatory power in comparison to conventional sets of candidate rate laws. The shape-constrained rate laws exhibit the flexibility of typical black-box models, while offering a transparent interpretation akin to conventionally applied rate laws such as Monod and Haldane. In addition, the shape-constrained spline models lead to limited extrapolation errors despite the large number of parameters. (C) 2017 Elsevier Ltd. All rights reserved