12 research outputs found

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

    Multilevel modelling for inference of genetic regulatory networks

    Get PDF
    Time-course experiments with microarrays are often used to study dynamic biological systems and genetic regulatory networks (GRNs) that model how genes influence each other in cell-level development of organisms. The inference for GRNs provides important insights into the fundamental biological processes such as growth and is useful in disease diagnosis and genomic drug design. Due to the experimental design, multilevel data hierarchies are often present in time-course gene expression data. Most existing methods, however, ignore the dependency of the expression measurements over time and the correlation among gene expression profiles. Such independence assumptions violate regulatory interactions and can result in overlooking certain important subject effects and lead to spurious inference for regulatory networks or mechanisms. In this paper, a multilevel mixed-effects model is adopted to incorporate data hierarchies in the analysis of time-course data, where temporal and subject effects are both assumed to be random. The method starts with the clustering of genes by fitting the mixture model within the multilevel random-effects model framework using the expectation-maximization (EM) algorithm. The network of regulatory interactions is then determined by searching for regulatory control elements (activators and inhibitors) shared by the clusters of co-expressed genes, based on a time-lagged correlation coefficients measurement. The method is applied to two real time-course datasets from the budding yeast (Saccharomyces cerevisiae) genome. It is shown that the proposed method provides clusters of cell-cycle regulated genes that are supported by existing gene function annotations, and hence enables inference on regulatory interactions for the genetic network

    Hub-Centered Gene Network Reconstruction Using Automatic Relevance Determination

    Get PDF
    Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data

    Uncovering a Macrophage Transcriptional Program by Integrating Evidence from Motif Scanning and Expression Dynamics

    Get PDF
    Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation

    Data Mining and Analysis on Multiple Time Series Object Data

    Get PDF
    Huge amount of data is available in our society and the need for turning such data into useful information and knowledge is urgent. Data mining is an important field addressing that need and significant progress has been achieved in the last decade. In several important application areas, data arises in the format of Multiple Time Series Object (MTSO) data, where each data object is an array of time series over a large set of features and each has an associated class or state. Very little research has been conducted towards this kind of data. Examples include computational toxicology, where each data object consists of a set of time series over thousands of genes, and operational stress management, where each data object consists of a set of time series over different measuring points on the human body. The purpose of this dissertation is to conduct a systematic data mining study over microarray time series data, with applications on computational toxicology. More specifically, we aim to consider several issues: feature selection algorithms for different classification cases, gene markers or feature set selection for toxic chemical exposure detection, toxic chemical exposure time prediction, wildness concept development and applications, and organizing diversified and parsimonious committee. We will formalize and analyze these research problems, design algorithms to address these problems, and perform experimental evaluations of the proposed algorithms. All these studies are based on microarray time series data set provided by Dr. McDougal

    Extracting transcriptional regulatory information from DNA microarray expression data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2003.Includes bibliographical references.(cont.) As a model system, we have chosen the unicellular, photoautotrophic cyanobacteria Synechocystis sp. PCC6803 for study, as it is 1) fully sequenced, 2) has an easily manipulated input signal (light for photosynthesis), and 3) fixes carbon dioxide into the commercially interesting, biodegradable polymer polyhydroxyalkanoate (PHA). We have created DNA microarrays with [approximately]97% of the Synechocystis genome represented in duplicate to monitor the cellular transcriptional profile. These arrays are used in time-series experiments of differing light levels to measure dynamic transcriptional response to changing environmental conditions. We have developed networks of potential genetic regulatory interactions through time-series analysis based on the data from our studies. An algorithm for combining gene position information, clustering, and time-lagged correlations has been created to generate networks of hypothetical biological links. Analysis of these networks indicates that good correlation exists between the input signal and certain groups of photosynthesis- and metabolism-related genes. Furthermore, this analysis technique placed these in a temporal context, showing the sequence of potential effects from changes in the experimental conditions. This data and hypothetical interaction networks have been used to construct AutoRegressive with eXogenous input (ARX) models. These provide dynamic, state-space models for prediction of transcriptional profiles given a dynamically changing set of environmental perturbations...Recent technological developments allow all the genes of a species to be monitored simultaneously at the transcriptional level. This necessitates a more global approach to biology that includes consideration of complex interactions between many genes and other intracellular species. The metaphor of a cell as a miniature chemical plant with inputs, outputs, and controls gives chemical engineers a foothold in this type of analysis. Networks of interacting genes are fertile ground for the application of the methods developed by engineers for the analysis and monitoring of industrial chemical processes. The DNA microarray has been established as a tool for efficient collection of mRNA expression data for a large number of genes simultaneously. Although great strides have been made in the methodology and instrumentation of this technique, the development of computational tools needed to interpret the results have received relatively inadequate attention. Existing analyses, such a clustering techniques applied to static data from cells at many different states, provide insight into co-expression of genes and are an important basis for exploration of the cell's genetic programming. We propose that an even greater level of regulatory detail may be gained by dynamically changing experimental conditions (the input signal) and measuring the time-delayed response of the genes (the output signal). The addition of temporal information to DNA microarray experiments should suggest potential cause/effect relationships among genes with significant regulatory responses to the conditions of interest. This thesis aims to develop computational techniques to maximize the information gained from such dynamic experiments.by William A. Schmitt, Jr.Ph.D
    corecore