150,586 research outputs found

    Model selection and adaptation for biochemical pathways

    Get PDF
    In bioinformatics, biochemical signal pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically obtaining the most appropriate model and learning its parameters is extremely interesting. One of the most often used approaches for model selection is to choose the least complex model which “fits the needs”. For noisy measurements, the model which has the smallest mean squared error of the observed data results in a model which fits too accurately to the data – it is overfitting. Such a model will perform good on the training data, but worse on unknown data. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated. Keywords: biochemical pathways, differential equations, septic shock, parameter estimation, overfitting, minimum description length

    Parameter inference and model selection for differential equation models

    Get PDF
    Includes bibliographical references.2015 Summer.Firstly, we consider the problem of estimating parameters of stochastic differential equations with discrete-time observations that are either completely or partially observed. The transition density between two observations is generally unknown. We propose an importance sampling approach with an auxiliary parameter when the transition density is unknown. We embed the auxiliary importance sampler in a penalized maximum likelihood framework which produces more accurate and computationally efficient parameter estimates. Simulation studies in three different models illustrate promising improvements of the new penalized simulated maximum likelihood method. The new procedure is designed for the challenging case when some state variables are unobserved and moreover, observed states are sparse over time, which commonly arises in ecological studies. We apply this new approach to two epidemics of chronic wasting disease in mule deer. Next, we consider the problem of selecting deterministic or stochastic models for a biological, ecological, or environmental dynamical process. In most cases, one prefers either deterministic or stochastic models as candidate models based on experience or subjective judgment. Due to the complex or intractable likelihood in most dynamical models, likelihood-based approaches for model selection are not suitable. We use approximate Bayesian computation for parameter estimation and model selection to gain further understanding of the dynamics of two epidemics of chronic wasting disease in mule deer. The main novel contribution of this work is that under a hierarchical model framework we compare three types of dynamical models: ordinary differential equation, continuous time Markov chain, and stochastic differential equation models. To our knowledge model selection between these types of models has not appeared previously. The practice of incorporating dynamical models into data models is becoming more common, the proposed approach may be useful in a variety of applications. Lastly, we consider estimation of parameters in nonlinear ordinary differential equation models with measurement error where closed-form solutions are not available. We propose a new numerical algorithm, the data driven adaptive mesh method, which is a combination of the Euler and 4th order Runge-Kutta methods with different step sizes based on the observation time points. Our results show that both the accuracy in parameter estimation and computational cost of the new algorithm improve over the most widely used numerical algorithm, the 4th Runge-Kutta method. Moreover, the generalized profiling procedure proposed by Ramsay et al. (2007) doesn't have good performance for sparse data in time as compared to the new approach. We illustrate our approach with both simulation studies and ecological data on intestinal microbiota

    Estimation and model selection for dynamic biomedical images

    Get PDF
    Compartment models are a frequently used tool for imaging data gained with medical and biological imaging techniques. The solutions of the differential equations derived from a compartment model provide nonlinear parametric functions, based on which the behavior of a concentration of interest over time can be described. Often, the number of compartments in a compartment model is unknown. As the model complexity itself, which is, the number of compartments, is certainly an important information, it is desirable to estimate it from the observed data. Additionally, the unknown parameters have to be estimated. Therefore, methods dealing with both the parameter estimation and model selection in compartment models are needed. The methods proposed in this thesis are motivated by two applications from the field of medical and biological imaging. In the first application, the quantitative analysis of Fluorescence recovery after photobleaching (FRAP) data, compartment models are used in order to gain insight into the binding behavior of molecules in living cells. As a first approach, we developed a Bayesian nonlinear mixed-effects model for the analysis of a series of FRAP images. Mixed-effect priors are defined on the parameters of the nonlinear model, which is a novel approach. With the proposed model, we get parameter estimates and additionally gain information about the variability between nuclei, which has not been studied so far. The proposed method was evaluated on half-nucleus FRAP data, also in comparison with different kinds of fixed-effects models. As a second approach, a pixelwise analysis of FRAP data is proposed, where information from the neighboring pixels is included into the nonlinear model for each pixel. This is innovative as the existing models are suitable for the analysis of FRAP data for some regions of interest only. For the second application, the quantitative analysis of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) of the breast, we use a compartment model which describes the exchange of blood between different, well-mixed compartments. In the analysis of such data, the number of compartments allows conclusions about the heterogeneity of cancerous tissue. Therefore, an estimation and model selection approach based on boosting, with which the number of compartments and the unknown parameters can be estimated at the voxel level, is proposed. In contrast to boosting for additive regression, where smoothing approaches are used, boosting in nonlinear parametric regression as described in this thesis is a novel approach. In an extension of this approach, the spatial structure of an image is taken into account by penalizing the differences in the parameter estimates of neighboring voxels. The evaluation of the method was done in simulation studies, as well as in the application to data from a breast cancer study. The majority of the program code used in the three approaches was newly developed in the programming languages R and C. Based on that code, two R packages were built

    Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model

    Full text link
    Modeling viral dynamics in HIV/AIDS studies has resulted in a deep understanding of pathogenesis of HIV infection from which novel antiviral treatment guidance and strategies have been derived. Viral dynamics models based on nonlinear differential equations have been proposed and well developed over the past few decades. However, it is quite challenging to use experimental or clinical data to estimate the unknown parameters (both constant and time-varying parameters) in complex nonlinear differential equation models. Therefore, investigators usually fix some parameter values, from the literature or by experience, to obtain only parameter estimates of interest from clinical or experimental data. However, when such prior information is not available, it is desirable to determine all the parameter estimates from data. In this paper we intend to combine the newly developed approaches, a multi-stage smoothing-based (MSSB) method and the spline-enhanced nonlinear least squares (SNLS) approach, to estimate all HIV viral dynamic parameters in a nonlinear differential equation model. In particular, to the best of our knowledge, this is the first attempt to propose a comparatively thorough procedure, accounting for both efficiency and accuracy, to rigorously estimate all key kinetic parameters in a nonlinear differential equation model of HIV dynamics from clinical data. These parameters include the proliferation rate and death rate of uninfected HIV-targeted cells, the average number of virions produced by an infected cell, and the infection rate which is related to the antiviral treatment effect and is time-varying. To validate the estimation methods, we verified the identifiability of the HIV viral dynamic model and performed simulation studies.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS290 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Stochastic Models Involving Second Order Lévy Motions

    Get PDF
    This thesis is based on five papers (A-E) treating estimation methods for unbounded densities, random fields generated by Lévy processes, behavior of Lévy processes at level crossings, and a Markov random field mixtures of multivariate Gaussian fields. In Paper A we propose an estimator of the location parameter for a density that is unbounded at the mode. The estimator maximizes a modified likelihood in which the singular term in the full likelihood is left out, whenever the parameter value approaches a neighborhood of the singularity location. The consistency and super-efficiency of this maximum leave-one-out likelihood estimator is shown through a direct argument. In Paper B we prove that the generalized Laplace distribution and the normal inverse Gaussian distribution are the only subclasses of the generalized hyperbolic distribution that are closed under convolution. In Paper C we propose a non-Gaussian Matérn random field models, generated through stochastic partial differential equations, with the class of generalized Hyperbolic processes as noise forcings. A maximum likelihood estimation technique based on the Monte Carlo Expectation Maximization algorithm is presented, and it is shown how to preform predictions at unobserved locations. In Paper D a novel class of models is introduced, denoted latent Gaussian random filed mixture models, which combines the Markov random field mixture model with the latent Gaussian random field models. The latent model, which is observed under a measurement noise, is defined as a mixture of several, possible multivariate, Gaussian random fields. Selection of which of the fields is observed at each location is modeled using a discrete Markov random field. Efficient estimation methods for the parameter of the models is developed using a stochastic gradient algorithm. In Paper E studies the behaviour of level crossing of non-Gaussian time series through a Slepian model. The approach is through developing a Slepian model for underlying random noise that drives the process which crosses the level. It is demonstrated how a moving average time series driven by Laplace noise can be analyzed through the Slepian noise approach. Methods for sampling the biased sampling distribution of the noise are based on an Gibbs sampler

    Data-driven modelling of biological multi-scale processes

    Full text link
    Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.Comment: This manuscript will appear in the Journal of Coupled Systems and Multiscale Dynamics (American Scientific Publishers

    Network estimation in State Space Model with L1-regularization constraint

    Full text link
    Biological networks have arisen as an attractive paradigm of genomic science ever since the introduction of large scale genomic technologies which carried the promise of elucidating the relationship in functional genomics. Microarray technologies coupled with appropriate mathematical or statistical models have made it possible to identify dynamic regulatory networks or to measure time course of the expression level of many genes simultaneously. However one of the few limitations fall on the high-dimensional nature of such data coupled with the fact that these gene expression data are known to include some hidden process. In that regards, we are concerned with deriving a method for inferring a sparse dynamic network in a high dimensional data setting. We assume that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some unknown or hidden process. We build an input-dependent linear state space model from these hidden states and demonstrate how an incorporated L1L_{1} regularization constraint in an Expectation-Maximization (EM) algorithm can be used to reverse engineer transcriptional networks from gene expression profiling data. This corresponds to estimating the model interaction parameters. The proposed method is illustrated on time-course microarray data obtained from a well established T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4, CASP4, CD69, and C3X1 to have higher number of inwards directed connections and FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed connections. We recommend these genes to be object for further investigation. Caspase 4 is also found to activate the expression of JunD which in turn represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359

    Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

    Full text link
    Approximate Bayesian computation methods can be used to evaluate posterior distributions without having to calculate likelihoods. In this paper we discuss and apply an approximate Bayesian computation (ABC) method based on sequential Monte Carlo (SMC) to estimate parameters of dynamical models. We show that ABC SMC gives information about the inferability of parameters and model sensitivity to changes in parameters, and tends to perform better than other ABC approaches. The algorithm is applied to several well known biological systems, for which parameters and their credible intervals are inferred. Moreover, we develop ABC SMC as a tool for model selection; given a range of different mathematical descriptions, ABC SMC is able to choose the best model using the standard Bayesian model selection apparatus.Comment: 26 pages, 9 figure

    Improving dynamic predictions with ensembles of observable models

    Get PDF
    Financiado para publicación en acceso aberto: Universidade de Vigo/CISUGMotivation: Dynamic mechanistic modelling in systems biology has been hampered by the complexity and variability associated with the underlying interactions, and by uncertain and sparse experimental measurements. Ensemble modelling, a concept initially developed in statistical mechanics, has been introduced in biological applications with the aim of mitigating those issues. Ensemble modelling uses a collection of different models compatible with the observed data to describe the phenomena of interest. However, since systems biology models often suffer from lack of identifiability and observability, ensembles of models are particularly unreliable when predicting non-observable states. Results: We present a strategy to assess and improve the reliability of a class of model ensembles. In particular, we consider kinetic models described using ordinary differential equations (ODEs) with a fixed structure. Our approach builds an ensemble with a selection of the parameter vectors found when performing parameter estimation with a global optimization metaheuristic. This technique enforces diversity during the sampling of parameter space and it can quantify the uncertainty in the predictions of state trajectories. We couple this strategy with structural identifiability and observability analysis, and when these tests detect possible prediction issues we obtain model reparameterizations that surmount them. The end result is an ensemble of models with the ability to predict the internal dynamics of a biological process. We demonstrate our approach with models of glucose regulation, cell division, circadian oscillations, and the JAK-STAT signalling pathway. Availability: The code that implements the methodology and reproduces the results is available at https://doi.org/10.5281/zenodo.6782638. Supplementary information: Supplementary data are available at Bioinformatics online.MCIN/AEI/ 10.13039/501100011033 | Ref. PID2020-117271RBC22MCIN/AEI/ 10.13039/501100011033 | Ref. PID2020-113992RA-I00MCIN/AEI/ 10.13039/501100011033 | Ref. RYC-2019-027537-IXunta de Galicia | Ref. ED431F 2021/00
    corecore