708 research outputs found

    Inferring Gene Regulatory Networks from Time Series Microarray Data

    Get PDF
    The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed. Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity. One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network

    Inference of gene regulatory networks from time series by Tsallis entropy

    Get PDF
    Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 <= q <= 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.Fundacao de Amparo e Amparo a Pesquisa do Estado de Sao Paulo (FAPESP)Coordenacao de Aperfeicofamento de Pessoal de Nivel Superior (CAPES)Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq

    Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

    Get PDF
    Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments

    Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana

    Get PDF
    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation

    Parallel mutual information estimation for inferring gene regulatory networks on GPUs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity.</p> <p>Results</p> <p>We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time.</p> <p>Conclusions</p> <p>CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.</p

    Data based identification and prediction of nonlinear and complex dynamical systems

    Get PDF
    We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin

    Reverse Engineering of Gene Regulatory Networks for Discovery of Novel Interactions in Pathways Using Gene Expression Data

    Get PDF
    A variety of chemicals in the environment have the potential to adversely affect the biological systems. We examined the responses of Rat (Rattus norvegicus) to the RDX exposure and female fathead minnows (FHM, Pimephales promelas) to a model aromatase inhibitor, fadrozole, using a transcriptional network inference approach. Rats were exposed to RDX and fish were exposed to 0 or 30mg/L fadrozole for 8 days. We analyzed gene expression changes using 8000 probes microarrays for rat experiment and 15,000 probe microarrays for fish. We used these changes to infer a transcriptional network. The central nervous system is remarkably plastic in its ability to recover from trauma. We examined recovery from chemicals in rats and fish through changes in transcriptional networks. Transcriptional networks from time series experiments provide a good basis for organizing and studying the dynamic behavior of biological processes. The goal of this work was to identify networks affected by chemical exposure and track changes in these networks as animals recover. The top 1254 significantly changed genes based upon 1.5-fold change and P\u3c 0.05 across all the time points from the fish data and 937 significantly changed genes from rat data were chosen for network modeling using either a Mutual Information network (MIN) or a Graphical Gaussian Model (GGM) or a Dynamic Bayesian Network (DBN) approach. The top interacting genes were queried to find sub-networks, possible biological networks, biochemical pathways, and network topologies impacted after exposure to fadrozole. The methods were able to reconstruct transcriptional networks with few hub structures, some of which were found to be involved in major biological process and molecular function. The resulting network from rat experiment exhibited a clear hub (central in terms of connections and direction) connectivity structure. Genes such as Ania-7, Hnrpdl, Alad, Gapdh, etc. (all CNS related), GAT-2, Gabra6, Gabbrl, Gabbr2 (GABA, neurotransmitter transporters and receptors), SLC2A1 (glucose transporter), NCX3 (Na-Ca exchanger), Gnal (Olfactory related), skn-la were showed up in our network as the \u27hub\u27 genes while some of the known transcription factors Msx3, Cacngl, Brs3, NGF1 etc. were also matched with our network model. Aromatase in the fish experiment was a highly connected gene in a sub-network along with other genes involved in steroidogenesis. Many of the sub-networks were involved in fatty acid metabolism, gamma-hexachlorocyclohexane degradation, and phospholipase activating pathways. Aromatase was a highly connected gene in a sub-network along with the genes LDLR, StAR, KRT18, HER1, CEBPB, ESR2A, and ACVRL1. Many of the subnetworks were involved in fatty acid metabolism, gamma-hexachlorocyclohexane degradation, and phospholipase activating pathways. A credible transcriptional network was recovered from both the time series data and the static data. The network included transcription factors and genes with roles in brain function, neurotransmission and sex hormone synthesis. Examination of the dynamic changes in expression within this network over time provided insight into recovery from traumas and chemical exposures
    corecore