    Least-squares methods for identifying biochemical regulatory networks from noisy measurements

    <b>Background</b>: We consider the problem of identifying the dynamic interactions in biochemical networks from noisy experimental data. Typically, approaches for solving this problem make use of an estimation algorithm such as the well-known linear Least-Squares (LS) estimation technique. We demonstrate that when time-series measurements are corrupted by white noise and/or drift noise, more accurate and reliable identification of network interactions can be achieved by employing an estimation algorithm known as Constrained Total Least Squares (CTLS). The Total Least Squares (TLS) technique is a generalised least squares method to solve an overdetermined set of equations whose coefficients are noisy. The CTLS is a natural extension of TLS to the case where the noise components of the coefficients are correlated, as is usually the case with time-series measurements of concentrations and expression profiles in gene networks. <b>Results</b>: The superior performance of the CTLS method in identifying network interactions is demonstrated on three examples: a genetic network containing four genes, a network describing p53 activity and <i>mdm2</i> messenger RNA interactions, and a recently proposed kinetic model for interleukin (IL)-6 and (IL)-12b messenger RNA expression as a function of ATF3 and NF-κB promoter binding. For the first example, the CTLS significantly reduces the errors in the estimation of the Jacobian for the gene network. For the second, the CTLS reduces the errors from the measurements that are corrupted by white noise and the effect of neglected kinetics. For the third, it allows the correct identification, from noisy data, of the negative regulation of (IL)-6 and (IL)-12b by ATF3. <b>Conclusion</b>: The significant improvements in performance demonstrated by the CTLS method under the wide range of conditions tested here, including different levels and types of measurement noise and different numbers of data points, suggests that its application will enable more accurate and reliable identification and modelling of biochemical networks

    Parameter estimation in kinetic reaction models using nonlinear observers facilitated by model extensions

    An essential part of mathematical modelling is the accurate and reliable estimation of model parameters. In biology, the required parameters are particularly difficult to measure due to either shortcomings of the measurement technology or a lack of direct measurements. In both cases, parameters must be estimated from indirect measurements, usually in the form of time-series data. Here, we present a novel approach for parameter estimation that is particularly tailored to biological models consisting of nonlinear ordinary differential equations. By assuming specific types of nonlinearities common in biology, resulting from generalised mass action, Hill kinetics and products thereof, we can take a three step approach: (1) transform the identification into an observer problem using a suitable model extension that decouples the estimation of non-measured states from the parameters; (2) reconstruct all extended states using suitable nonlinear observers; (3) estimate the parameters using the reconstructed states. The actual estimation of the parameters is based on the intrinsic dependencies of the extended states arising from the definitions of the extended variables. An important advantage of the proposed method is that it allows to identify suitable measurements and/or model structures for which the parameters can be estimated. Furthermore, the proposed identification approach is generally applicable to models of metabolic networks, signal transduction and gene regulation

    Computational models for inferring biochemical networks

    Biochemical networks are of great practical importance. The interaction of biological compounds in cells has been enforced to a proper understanding by the numerous bioinformatics projects, which contributed to a vast amount of biological information. The construction of biochemical systems (systems of chemical reactions), which include both topology and kinetic constants of the chemical reactions, is NP-hard and is a well-studied system biology problem. In this paper, we propose a hybrid architecture, which combines genetic programming and simulated annealing in order to generate and optimize both the topology (the network) and the reaction rates of a biochemical system. Simulations and analysis of an artificial model and three real models (two models and the noisy version of one of them) show promising results for the proposed method.The Romanian National Authority for Scientific Research, CNDI–UEFISCDI, Project No. PN-II-PT-PCCA-2011-3.2-0917

    Determining Interconnections in Chemical Reaction Networks

    We present a methodology for robust determination of chemical reaction network interconnections. Given time series data that are collected from experiments and taking into account the measurement error, we minimize the 1-norm of the decision variables (reaction rates) keeping the data in close Euler-flt with a general model structure based on mass action kinetics which models the species' dynamics. We illustrate our methodology on a hypothetical chemical reaction network under various experimental scenarios

    Parameter Estimation and Model Selection in Computational Biology

    A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection

    Plato's Cave Algorithm: Inferring Functional Signaling Networks from Early Gene Expression Shadows

    Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes) show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA) to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are feasible for any signaling network to predict the functional topology of the network and to identify novel relationships

    Reactive SINDy: Discovering governing reactions from concentration data

    The inner workings of a biological cell or a chemical reaction can be rationalized by the network of reactions, whose structure reveals the most important functional mechanisms. For complex systems, these reaction networks are not known a priori and cannot be efficiently computed with ab initio methods, therefore an important approach goal is to estimate effective reaction networks from observations, such as time series of the main species. Reaction networks estimated with standard machine learning techniques such as least-squares regression may fit the observations, but will typically contain spurious reactions. Here we extend the sparse identification of nonlinear dynamics (SINDy) method to vector-valued ansatz functions, each describing a particular reaction process. The resulting sparse tensor regression method “reactive SINDy” is able to estimate a parsimonious reaction network. We illustrate that a gene regulation network can be correctly estimated from observed time series

    Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes

    Multiple biological processes are driven by oscillatory gene expression at different time scales. Pulsatile dynamics are thought to be widespread, and single-cell live imaging of gene expression has lead to a surge of dynamic, possibly oscillatory, data for different gene networks. However, the regulation of gene expression at the level of an individual cell involves reactions between finite numbers of molecules, and this can result in inherent randomness in expression dynamics, which blurs the boundaries between aperiodic fluctuations and noisy oscillators. Thus, there is an acute need for an objective statistical method for classifying whether an experimentally derived noisy time series is periodic. Here we present a new data analysis method that combines mechanistic stochastic modelling with the powerful methods of non-parametric regression with Gaussian processes. Our method can distinguish oscillatory gene expression from random fluctuations of non-oscillatory expression in single-cell time series, despite peak-to-peak variability in period and amplitude of single-cell oscillations. We show that our method outperforms the Lomb-Scargle periodogram in successfully classifying cells as oscillatory or non-oscillatory in data simulated from a simple genetic oscillator model and in experimental data. Analysis of bioluminescent live cell imaging shows a significantly greater number of oscillatory cells when luciferase is driven by a {\it Hes1} promoter (10/19), which has previously been reported to oscillate, than the constitutive MoMuLV 5' LTR (MMLV) promoter (0/25). The method can be applied to data from any gene network to both quantify the proportion of oscillating cells within a population and to measure the period and quality of oscillations. It is publicly available as a MATLAB package.Comment: 36 pages, 17 figure

    Combining Network Modeling and Experimental Approaches to Predict Drug Combination Responses

    Cancer is a lethal disease and complex at multiple levels of cell biology. Despite many advances in treatments, many patients do not respond to therapy. This is owing to the complexity of cancer-genetic variability due to mutations, the multi-variate biochemical networks within which drug targets reside and existence and plasticity of multiple cell states. It is generally understood that a combination of drugs is a way to address the multi-faceted drivers of cancer and drug resistance. However, the sheer number of testable combinations and challenges in matching patients to appropriate combination treatments are major issues. Here, we first present a general method of network inference which can be applied to infer biological networks. We apply this method to infer different kinds of networks in biological levels where cancer complexity resides-a biochemical network, gene expression and cell state transitions. Next, we focus our attention on glioblastoma and with pharmacological and biological considerations, obtain a ranked list of important drug targets in glioblastoms. We perform drug dose response experiments for 22 blood brain barrier penetrant drugs against 3 glioblastoma cell lines. These methods and experimental results inform a construction of a temporal cell state model to predict and experimentally validate combination treatments for certain drugs. We improve an experimental method to perform high throughput western blots and apply the method to discover biochemical interactions among some important proteins involved in temporal cell state transitions. Lastly, we illustrate a method to investigate potential resistance mechanisms in genome scale proteomic data. We hope that methods and results presented here can be adapted and improved upon to help in the discovery of biochemical interactions, capturing cell state transitions and ultimately help predict effective combination therapies for cancer