48 research outputs found

    Novel strategies for process control based on hybrid semi-parametric mathematical systems

    Get PDF
    Tese de doutoramento. Engenharia Química. Universidade do Porto. Faculdade de Engenharia. 201

    Boosting functional regression models

    Get PDF
    In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates. In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor. For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}. The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates

    From Metabolite Concentration to Flux – A Systematic Assessment of Error in Cell Culture Metabolomics

    Get PDF
    The growing availability of genomic, transcriptomic, and metabolomic data has opened the door to the synthesis of multiple levels of information in biological research. As a consequence, there has been a push to analyze biological systems in a comprehensive manner through the integration of their interactions into mathematical models, with the process frequently referred to as “systems biology”. Despite the potential for this approach to greatly improve our knowledge of biological systems, the definition of mathematical relationships between different levels of information opens the door to diverse sources of error, requiring precise, unbiased quantification as well as robust validation methods. Failure to account for differences in uncertainty across multiple levels of data analysis may cause errors to drown out any useful outcomes of the synthesis. The application of a systems biology approach has been particularly important in metabolic modeling. There has been a concentrated effort to build models directly from genomic data and to incorporate as much of the metabolome as possible in the analysis. Metabolomic data collection has been expanded through the recent use of hydrogen Nuclear Magnetic Resonance (1H-NMR) spectroscopy for cell culture monitoring. However, the combination of uncertainty from model construction and measurement error from NMR (or other means of metabolomic) analysis complicates data interpretation. This thesis establishes the precision and accuracy of NMR spectroscopy in the context of cell cultivation while developing a methodology for assessing model error in Metabolic Flux Analysis (MFA). The analysis of cell culture media via NMR has been made possible by the development of specialized software for the “deconvolution” of complex spectra, however, the process is semi-qualitative. A human “profiler” is required to manually fit idealized peaks from a compound library to an observed spectra, where the quality of fit is often subject to considerable interpretation. Work presented in this thesis establishes baseline accuracy as approximately 2%-10% of the theoretical mean, with a relative standard deviation of 1.5% to 3%. Higher variabilities were associated primarily with profiling error, while lower variabilities were due in part to tube insertion (and the steps leading up to spectra acquisition). Although a human profiler contributed to overall uncertainty, the net impact did not make the deconvolution process prohibitively imprecise. Analysis was then expanded to consider solutions that are more representative of cell culture supernatant. The combination of metabolites at different concentration levels was efficiently represented by a Plackett-Burman experiment. The orthogonality of this design ensured that every level of metabolite concentration was combined with an equal number of high and low concentrations of all other variable metabolites, providing a worst-case scenario for variance estimation. Analysis of media-like mixtures revealed a median error and standard deviation to be approximately 10%, although estimating low metabolite concentrations resulted in a considerable loss of accuracy and precision in the presence of resonance overlap. Furthermore, an iterative regression process identified a number of cases where an increase in the concentration of one metabolite resulted in increased quantification error of another. More importantly, the analysis established a general methodology for estimating the quantification variability of media-specific metabolite concentrations. Subsequent application of NMR analysis to time-course data from cell cultivation revealed correlated deviations from calculated trends. Similar deviations were observed for multiple (chemically) unrelated metabolites, amounting to approximately 1%-10% of the metabolite’s concentration. The nature of these deviations suggested the cause to be inaccuracies in internal standard addition or quantification, resulting in a skew of all quantified metabolite concentrations within a sample by the same relative amount. Error magnitude was estimated by calculating the median relative deviation from a smoothing fit for all compounds at a give timepoint. A metabolite time-course simulation was developed to determine the frequency and magnitude of such deviations arising from typical measurement error (without added bias from incorrect internal standard addition). Multiple smoothing functions were tested on simulated time-courses and cubic spline regression was found to minimize the median relative deviation from measurement noise to approximately 2.5%. Based on these results, an iterative smoothing correction method was implemented to identify and correct median deviations greater than 2.5%, with both simulation and correction code released as the “metcourse” package for the R programming language. Finally, a t-test validation method was developed to assess the impact of measurement and model error on MFA, with a Chinese hamster ovary (CHO) cell model chosen as a case study. The standard MFA formulation was recast as a generalized least squares (GLS) problem, with calculated fluxes subject to a t-significance test. NMR data was collected for a CHO cell bioreactor run, with another set of data simulated directly from the model and perturbed by observed measurement error. The frequency of rejected fluxes in the simulated data (free of model error) was attributed to measurement uncertainty alone. The rejection of fluxes calculated from observed data as non-significant that were not rejected in the simulated data was attributed to a lack of model fit i.e. model error. Applying this method to the observed data revealed a considerable level of error that was not identified by traditional χ2 validation. Further simulation was carried out to assess the impact of measurement error and model structure, both of which were found to have a dramatic impact on statistical significance and calculation error that has yet to be addressed in the context of MFA

    NASA Tech Briefs, January 1989

    Get PDF
    Topics include: Electronic Components & and Circuits. Electronic Systems, A Physical Sciences, Materials, Computer Programs, Mechanics, Machinery, Fabrication Technology, Mathematics and Information Sciences, and Life Sciences

    Proceedings of the 7th International Conference on Functional-Structural Plant Models, Saariselkä, Finland, 9 - 14 June 2013

    Get PDF

    Laboratory Directed Research and Development Annual Report - Fiscal Year 2000

    Full text link

    Shape constrained splines as transparent black-box models for bioprocess modeling

    No full text
    Empirical model identification for biological systems is a challenging task due to the combined effects of complex interactions, nonlinear effects, and lack of specific measurements. In this context, several researchers have provided tools for experimental design, model structure selection, and optimal parameter estimation, often packaged together in iterative model identification schemes. Still, one often has to rely on a limited number of candidate rate laws such as Contois, Haldane, Monod, Moser, and Tessier. In this work, we propose to use shape-constrained spline functions as a way to reduce the number of candidate rate laws to be considered in a model identification study, while retaining or even expanding the explanatory power in comparison to conventional sets of candidate rate laws. The shape-constrained rate laws exhibit the flexibility of typical black-box models, while offering a transparent interpretation akin to conventionally applied rate laws such as Monod and Haldane. In addition, the shape-constrained spline models lead to limited extrapolation errors despite the large number of parameters. (C) 2017 Elsevier Ltd. All rights reserved
    corecore