16,705 research outputs found
Exact reconstruction of gene regulatory networks using compressive sensing.
BackgroundWe consider the problem of reconstructing a gene regulatory network structure from limited time series gene expression data, without any a priori knowledge of connectivity. We assume that the network is sparse, meaning the connectivity among genes is much less than full connectivity. We develop a method for network reconstruction based on compressive sensing, which takes advantage of the network's sparseness.ResultsFor the case in which all genes are accessible for measurement, and there is no measurement noise, we show that our method can be used to exactly reconstruct the network. For the more general problem, in which hidden genes exist and all measurements are contaminated by noise, we show that our method leads to reliable reconstruction. In both cases, coherence of the model is used to assess the ability to reconstruct the network and to design new experiments. We demonstrate that it is possible to use the coherence distribution to guide biological experiment design effectively. By collecting a more informative dataset, the proposed method helps reduce the cost of experiments. For each problem, a set of numerical examples is presented.ConclusionsThe method provides a guarantee on how well the inferred graph structure represents the underlying system, reveals deficiencies in the data and model, and suggests experimental directions to remedy the deficiencies
Parameter estimation for Boolean models of biological networks
Boolean networks have long been used as models of molecular networks and play
an increasingly important role in systems biology. This paper describes a
software package, Polynome, offered as a web service, that helps users
construct Boolean network models based on experimental data and biological
input. The key feature is a discrete analog of parameter estimation for
continuous models. With only experimental data as input, the software can be
used as a tool for reverse-engineering of Boolean network models from
experimental time course data.Comment: Web interface of the software is available at
http://polymath.vbi.vt.edu/polynome
Stochastic dynamic modeling of short gene expression time-series data
Copyright [2008] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.In this paper, the expectation maximization (EM) algorithm is applied for modeling the gene regulatory network from gene time-series data. The gene regulatory network is viewed as a stochastic dynamic model, which consists of the noisy gene measurement from microarray and the gene regulation first-order autoregressive (AR) stochastic dynamic process. By using the EM algorithm, both the model parameters and the actual values of the gene expression levels can be identified simultaneously. Moreover, the algorithm can deal with the sparse parameter identification and the noisy data in an efficient way. It is also shown that the EM algorithm can handle the microarray gene expression data with large number of variables but a small number of observations. The gene expression stochastic dynamic models for four real-world gene expression data sets are constructed to demonstrate the advantages of the introduced algorithm. Several indices are proposed to evaluate the models of inferred gene regulatory networks, and the relevant biological properties are discussed
Data based identification and prediction of nonlinear and complex dynamical systems
We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
Recommended from our members
Bayesian Inference for Genomic Data Analysis
High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data.
Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters.
Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples
Model-based clustering with data correction for removing artifacts in gene expression data
The NIH Library of Integrated Network-based Cellular Signatures (LINCS)
contains gene expression data from over a million experiments, using Luminex
Bead technology. Only 500 colors are used to measure the expression levels of
the 1,000 landmark genes measured, and the data for the resulting pairs of
genes are deconvolved. The raw data are sometimes inadequate for reliable
deconvolution leading to artifacts in the final processed data. These include
the expression levels of paired genes being flipped or given the same value,
and clusters of values that are not at the true expression level. We propose a
new method called model-based clustering with data correction (MCDC) that is
able to identify and correct these three kinds of artifacts simultaneously. We
show that MCDC improves the resulting gene expression data in terms of
agreement with external baselines, as well as improving results from subsequent
analysis.Comment: 28 page
- …