1,761 research outputs found

    On the relationship between sloppiness and identifiability

    Get PDF
    25 páginas, 11 figuras, 2 tablasDynamic models of biochemical networks are often formulated as sets of non-linear ordinary differential equations, whose states are the concentrations or abundances of the network components. They typically have a large number of kinetic parameters, which must be determined by calibrating the model with experimental data. In recent years it has been suggested that dynamic systems biology models are universally sloppy, meaning that the values of some parameters can be perturbed by several orders of magnitude without causing significant changes in the model output. This observation has prompted calls for focusing on model predictions rather than on parameters. In this work we examine the concept of sloppiness, investigating its links with the long-established notions of structural and practical identifiability. By analysing a set of case studies we show that sloppiness is not equivalent to lack of identifiability, and that sloppy models can be identifiable. Thus, using sloppiness to draw conclusions about the possibility of estimating parameter values can be misleading. Instead, structural and practical identifiability analyses are better tools for assessing the confidence in parameter estimates. Furthermore, we show that, when designing new experiments to decrease parametric uncertainty, designs that optimize practical identifiability criteria are more informative than those that minimize sloppinessThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 686282 (“CANPATHPRO”) and from the Spanish government (MINECO) and the European Regional Development Fund (ERDF) through the projects “SYNBIOFACTORY” (grant number DPI2014-55276-C5-2-R), and “IMPROWINE” (grant number AGL2015-67504-C3-2-R)N

    Novel techniques for kinetic model identification and improvement

    Get PDF
    Physics-based kinetic models are regarded as key tools for supporting the design and control of chemical processes and for understanding which degrees of freedom ultimately determine the observed behaviour of chemical systems. These models are formulated as sets of differential and algebraic equations where many state variables and parameters may be involved. Nonetheless, the translation of the available experimental evidence into an appropriate set of model equations is a time and resource intensive task that significantly relies on the presence of experienced scientists. Automated reactor platforms are increasingly being applied in research laboratories to generate large amounts of kinetic data with minimum human intervention. However, in most cases, these platforms do not implement software for the online identification of physics-based kinetic models. While automated reactor technologies have significantly improved the efficiency in the data collection process, the analysis of the data for modelling purposes still represents a tedious process that is mainly carried out a-posteriori by the scientist. This project focuses on how to systematically solve some relevant problems in kinetic modelling studies that would normally require the intervention of experienced modellers to be addressed. Specifically, the following challenges are considered: i) the selection of a robust model parametrisation to reduce the chance of numerical failures in the course of the model identification process; ii) the experimental design and parameter estimation problems in conditions of structural model uncertainty; iii) the improvement of approximated models embracing the available experimental evidence. The work presented in this Thesis paves the way towards fully automated kinetic modelling platforms through the development of intelligent algorithms for experimental design and model building under system uncertainty. The project aims at the definition of comprehensive and systematic modelling frameworks to make the modelling activity more efficient and less sensitive to human error and bias

    Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.

    Get PDF
    Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker classification is presented in this dissertation. In particular, the biased workers are classified as a subcategory of sloppy workers. Finally, an ITerative Self Correcting - Truth Discovery (ITSC-TD) framework is then proposed, which can reliably recognize biased workers in ordinal labeling tasks, based on a probabilistic based bias detection model. ITSC-TD estimates true labels through applying an optimization based truth discovery method, which minimizes overall label errors by assigning different weights to workers. The typical tasks posted on popular crowdsourcing platforms, such as MTurk, are simple tasks, which are low in complexity, independent, and require little time to complete. Complex tasks, however, in many cases require the crowd workers to possess specialized skills in task domains. As a result, this type of task is more inclined to have the problem of poor quality of feedback from crowds, compared to simple tasks. As such, we propose a multiple views approach, for the purpose of obtaining high quality consensus labels in complex labeling tasks. In this approach, each view is defined as a labeling critique or rubric, which aims to guide the workers to become aware of the desirable work characteristics or goals. Combining the view labels results in the overall estimated labels for each item. The multiple views approach is developed under the hypothesis that workers\u27 performance might differ from one view to another. Varied weights are then assigned to different views for each worker. Additionally, the ITSC-TD framework is integrated into the multiple views model to achieve high quality estimated truths for each view. Next, we propose a Semi-supervised Worker Filtering (SWF) model to eliminate spam workers, who assign random labels for each item. The SWF approach conducts worker filtering with a limited set of gold truths available as priori. Each worker is associated with a spammer score, which is estimated via the developed semi-supervised model, and low quality workers are efficiently detected by comparing the spammer score with a predefined threshold value. The efficiency of all the developed frameworks and models are demonstrated on simulated and real-world data sets. By comparing the proposed frameworks to a set of state-of-art methodologies, such as expectation maximization based aggregating algorithm, GLAD and optimization based truth discovery approach, in the domain of crowdsourcing, up to 28.0% improvement can be obtained for the accuracy of true label estimation

    Review: to be or not to be an identifiable model. Is this a relevant question in animal science modelling?

    Get PDF
    International audienceWhat is a good (useful) mathematical model in animal science? For models constructed for prediction purposes, the question of model adequacy (usefulness) has been traditionally tackled by statistical analysis applied to observed experimental data relative to model-predicted variables. However, little attention has been paid to analytic tools that exploit the mathematical properties of the model equations. For example, in the context of model calibration, before attempting a numerical estimation of the model parameters, we might want to know if we have any chance of success in estimating a unique best value of the model parameters from available measurements. This question of uniqueness is referred to as structural identifiability; a mathematical property that is defined on the sole basis of the model structure within a hypothetical ideal experiment determined by a setting of model inputs (stimuli) and observable variables (measurements). Structural identifiability analysis applied to dynamic models described by ordinary differential equations (ODE) is a common practice in control engineering and system identification. This analysis demands mathematical technicalities that are beyond the academic background of animal science, which might explain the lack of pervasiveness of identifiability analysis in animal science modelling. To fill this gap, in this paper we address the analysis of structural identifiability from a practitioner perspective by capitalizing on the use of dedicated software tools. Our objectives are (i) to provide a comprehensive explanation of the structural identifiability notion for the community of animal science modelling, (ii) to assess the relevance of identifiability analysis in animal science modelling and (iii) to motivate the community to use identifiability analysis in the modelling practice (when the identifiability question is relevant). We focus our study on ODE models. By using illustrative examples that include published mathematical models describing lactation in cattle, we show how structural identifiability analysis can contribute to advancing mathematical modelling in animal science towards the production of useful models and highly informative experiments. Rather than attempting to impose a systematic identifiability analysis to the modelling community during model developments, we wish to open a window towards the discovery of a powerful tool for model construction and experiment design

    Identifiability of large nonlinear biochemical networks

    Get PDF
    Dynamic models formulated as a set of ordinary differential equations provide a detailed description of the time-evolution of a system. Such models of (bio)chemical reaction networks have contributed to important advances in biotechnology and biomedical applications, and their impact is foreseen to increase in the near future. Hence, the task of dynamic model building has attracted much attention from scientists working at the intersection of biochemistry, systems theory, mathematics, and computer science, among other disciplines-an area sometimes called systems biology. Before a model can be effectively used, the values of its unknown parameters have to be estimated from experimental data. A necessary condition for parameter estimation is identifiability, the property that, for a certain output, there exists a unique (or finite) set of parameter values that produces it. Identifiability can be analysed from two complementary points of view: structural (which searches for symmetries in the model equations that may prevent parameters from being uniquely determined) or practical (which focuses on the limitations introduced by the quantity and quality of the data available for parameter estimation). Both types of analyses are often difficult for nonlinear models, and their complexity increases rapidly with the problem size. Hence, assessing the identifiability of realistic dynamic models of biochemical networks remains a challenging task. Despite the fact that many methods have been developed for this purpose, it is still an open problem and an active area of research. Here we review the theory and tools available for the study of identifiability, and discuss some closely related concepts such as sensitivity to parameter perturbations, observability, distinguishability, and optimal experimental design, among others.This work was funded by the Galician government (Xunta de Galiza) through the I2C postdoctoral program (fellowship ED481B2014/133-0), and by the Spanish Ministry of Economy and Competitiveness (grant DPI2013-47100-C2-2-P)
    • …
    corecore