Search CORE

264 research outputs found

Statistical inference of the time-varying structure of gene-regulation networks

Author: Becq Jennifer
Devaux Frédéric
Lelandais Gaëlle
Lèbre Sophie
Stumpf Michael PH
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. Methods To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). Results We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of <it>Drosophila melanogaster </it>and the response of <it>Saccharomyces cerevisiae </it>to benomyl poisoning. Conclusions ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.</p

Crossref

Directory of Open Access Journals

University of Melbourne Institutional Repository

Hub-Centered Gene Network Reconstruction Using Automatic Relevance Determination

Author: A Bernard
A Hartemink
A Margolin
CM Bishop
D Marbach
D Spiegelhalter
D Szklarczyk
E Van Someren
ER DeLong
F Li
G Csardi
G Lima-Mendez
G Sales
G Stolovitzky
H Jeong
Hiroshi Tanaka
I Harvey
I Shmulevich
J Mazur
JA Hartigan
K Fellenberg
L Kaderali
L Kaderali
L Kaderali
Lars Kaderali
M Arnone
M Newman
Matthias Böck
N Friedman
N Friedman
P Mayer
P Mendes
P Spellman
P Woolf
R Bonneau
R Cho
R Guthke
R Khanin
R Neal
RJ Prill
S Duane
S Kauffman
S Liang
Soichi Ogishima
Stefan Kramer
Szabolcs Semsey
T Chen
T Chen
T Pramila
VYF Tan
X Robin
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Gutenberg Open

Inferring Gene Regulatory Networks from Time Series Microarray Data

Author: Li Peng
Publication venue: The Aquila Digital Community
Publication date: 01/08/2009
Field of study

The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed. Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity. One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network

Aquila Digital Community

A stochastic model dissects cell states in biological transition processes

Author: Armond Jonathan W.
Jaenisch Rudolf
Mukherjee Sach
Nicodemi Mario
Oates Chris J.
Rana Anas A.
Saha Krishanu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2013
Field of study

Many biological processes, including differentiation, reprogramming, and disease transformations, involve transitions of cells through distinct states. Direct, unbiased investigation of cell states and their transitions is challenging due to several factors, including limitations of single-cell assays. Here we present a stochastic model of cellular transitions that allows underlying single-cell information, including cell-state-specific parameters and rates governing transitions between states, to be estimated from genome-wide, population-averaged time-course data. The key novelty of our approach lies in specifying latent stochastic models at the single-cell level, and then aggregating these models to give a likelihood that links parameters at the single-cell level to observables at the population level. We apply our approach in the context of reprogramming to pluripotency. This yields new insights, including profiles of two intermediate cell states, that are supported by independent single-cell studies. Our model provides a general conceptual framework for the study of cell transitions, including epigenetic transformations

Archivio della ricerca - Università degli studi di Napoli Federico II

DSpace@MIT

Crossref

OPUS - University of Technology Sydney

PubMed Central

Warwick Research Archives Portal Repository

Exploiting biological pathways to infer temporal gene interaction models

Author: Kemper Corey Ann, 1979-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 156-166).An important goal in genomic research is the reconstruction of the complete picture of temporal interactions among all genes, but this inference problem is not tractable because of the large number of genes, the small number of experimental observations for each gene, and the complexity of biological networks. We focus instead on the B cell receptor (BCR) signaling pathway, which narrows the inference problem and provides a clinical application, as B cell chronic lymphocytic leukemia (B-CLL) is believed to be related to BCR response. In this work, we infer population-dependent gene networks of temporal interaction within the BCR signaling pathway. We develop simple statistical models that capture the temporal behavior of differentially expressed genes and then estimate the parameters in an Expectation-Maximization framework, resulting in clusters with a biological interpretation for each subject population. Using the cluster labels to define a small number of modes of interaction and imposing sparsity constraints to effectively limit the number of genes influencing each target gene makes the ill-posed problem of network inference tractable.(cont.) For both the clustering and the inference of the predictive models, we have statistical results that show that we successfully capture the temporal structure of and the interactions between the genes relevant to the BCR. signaling pathway. We have confirmatory results from a biological standpoint, in which genes that we have identified as playing key roles in the networks have already been shown in previous work to be relevant to BCR. stimulation, but we also have results that guide future experiments in the study of other related genes, in order to further the long term goal of a full understanding of how and why B-CLL cells behave abnormally.by Corey Ann Kemper.Ph.D

DSpace@MIT

Recommended from our members

Bayesian Inference for Genomic Data Analysis

Author: Ogundijo Oyetunji Enoch
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data. Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters. Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples

Columbia University Academic Commons

Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data

Author: Kevin Y. Yip
Koon-Kiu Yan
Magnus Rattray
Mark Gerstein
Roger P. Alexander
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

We performed computational reconstruction of the in silico gene regulatory networks in the DREAM3 Challenges. Our task was to learn the networks from two types of data, namely gene expression profiles in deletion strains (the ‘deletion data’) and time series trajectories of gene expression after some initial perturbation (the ‘perturbation data’). In the course of developing the prediction method, we observed that the two types of data contained different and complementary information about the underlying network. In particular, deletion data allow for the detection of direct regulatory activities with strong responses upon the deletion of the regulator while perturbation data provide richer information for the identification of weaker and more complex types of regulation. We applied different techniques to learn the regulation from the two types of data. For deletion data, we learned a noise model to distinguish real signals from random fluctuations using an iterative method. For perturbation data, we used differential equations to model the change of expression levels of a gene along the trajectories due to the regulation of other genes. We tried different models, and combined their predictions. The final predictions were obtained by merging the results from the two types of data. A comparison with the actual regulatory networks suggests that our approach is effective for networks with a range of different sizes. The success of the approach demonstrates the importance of integrating heterogeneous data in network reconstruction

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

Gene expression trees in lymphoid development

Author: Costa Ivan G
Roepcke Stefan
Schliep Alexander
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The regulatory processes that govern cell proliferation and differentiation are central to developmental biology. Particularly well studied in this respect is the lymphoid system due to its importance for basic biology and for clinical applications. Gene expression measured in lymphoid cells in several distinguishable developmental stages helps in the elucidation of underlying molecular processes, which change gradually over time and lock cells in either the B cell, T cell or Natural Killer cell lineages. Large-scale analysis of these <it>gene expression trees </it>requires computational support for tasks ranging from visualization, querying, and finding clusters of similar genes, to answering detailed questions about the functional roles of individual genes. Results We present the first statistical framework designed to analyze gene expression data as it is collected in the course of lymphoid development through clusters of co-expressed genes and additional heterogeneous data. We introduce dependence trees for continuous variates, which model the inherent dependencies during the differentiation process naturally as gene expression trees. Several trees are combined in a mixture model to allow inference of potentially overlapping clusters of co-expressed genes. Additionally, we predict microRNA targets. Conclusion Computational results for several data sets from the lymphoid system demonstrate the relevance of our framework. We recover well-known biological facts and identify promising novel regulatory elements of genes and their functional assignments. The implementation of our method (licensed under the GPL) is available at <url>http://algorithmics.molgen.mpg.de/Supplements/ExpLym/</url>.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Microarray Data Mining and Gene Regulatory Network Analysis

Author: Li Ying
Publication venue: The Aquila Digital Community
Publication date: 01/05/2011
Field of study

The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc. By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity

Aquila Digital Community

Optimization algorithms for inference and classification of genetic profiles from undersampled measurements

Author: Bayar Belhassen
Publication venue: Rowan Digital Works
Publication date: 02/09/2014
Field of study

In this thesis, we tackle three different problems, all related to optimization techniques for inference and classification of genetic profiles. First, we extend the deterministic Non-negative Matrix Factorization (NMF) framework to the probabilistic case (PNMF). We apply the PNMF algorithm to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy. Second, we propose SMURC: Small-sample MUltivariate Regression with Covariance estimation. Specifically, we consider a high dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. We show that, in this case, the maximum likelihood approach is senseless because the likelihood diverges. We propose a normalization of the likelihood function that guarantees convergence. Simulation results show that SMURC outperforms the regularized likelihood estimator with known covariance matrix and the state-of-the-art sparse Conditional Graphical Gaussian Model (sCGGM). In the third Chapter, we derive a new greedy algorithm that provides an exact sparse solution of the combinatorial l sub zero-optimization problem in an exponentially less computation time. Unlike other greedy approaches, which are only approximations of the exact sparse solution, the proposed greedy approach, called Kernel reconstruction, leads to the exact optimal solution

Rowan University