460 research outputs found

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

    Discovering Biological Progression Underlying Microarray Samples

    Get PDF
    In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression

    Learning from High-Dimensional Multivariate Signals.

    Full text link
    Modern measurement systems monitor a growing number of variables at low cost. In the problem of characterizing the observed measurements, budget limitations usually constrain the number n of samples that one can acquire, leading to situations where the number p of variables is much larger than n. In this situation, classical statistical methods, founded on the assumption that n is large and p is fixed, fail both in theory and in practice. A successful approach to overcome this problem is to assume a parsimonious generative model characterized by a number k of parameters, where k is much smaller than p. In this dissertation we develop algorithms to fit low-dimensional generative models and extract relevant information from high-dimensional, multivariate signals. First, we define extensions of the well-known Scalar Shrinkage-Thresholding Operator, that we name Multidimensional and Generalized Shrinkage-Thresholding Operators, and show that these extensions arise in numerous algorithms for structured-sparse linear and non-linear regression. Using convex optimization techniques, we show that these operators, defined as the solutions to a class of convex, non-differentiable, optimization problems have an equivalent convex, low-dimensional reformulation. Our equivalence results shed light on the behavior of a general class of penalties that includes classical sparsity-inducing penalties such as the LASSO and the Group LASSO. In addition, our reformulation leads in some cases to new efficient algorithms for a variety of high-dimensional penalized estimation problems. Second, we introduce two new classes of low-dimensional factor models that account for temporal shifts commonly occurring in multivariate signals. Our first contribution, called Order Preserving Factor Analysis, can be seen as an extension of the non-negative, sparse matrix factorization model to allow for order-preserving temporal translations in the data. We develop an efficient descent algorithm to fit this model using techniques from convex and non-convex optimization. Our second contribution extends Principal Component Analysis to the analysis of observations suffering from circular shifts, and we call it Misaligned Principal Component Analysis. We quantify the effect of the misalignments in the spectrum of the sample covariance matrix in the high-dimensional regime and develop simple algorithms to jointly estimate the principal components and the misalignment parameters.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91544/1/atibaup_1.pd

    Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA), a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM).</p> <p>Results</p> <p>While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K)-specific demethylase 5B) and HDACs (histone deacetylases), which could epigenetically coordinate gene expression after irradiation.</p> <p>Conclusions</p> <p>In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.</p

    From gene-expressions to pathways

    Get PDF
    Rapid advancements in experimental techniques have benefited molecular biology in many ways. The experiments once considered impossible due to the lack of resources can now be performed with relative ease in an acceptable time-span; monitoring simultaneous expressions of thousands of genes at a given time point is one of them. Microarray technology is the most popular method in biological sciences to observe the simultaneous expression levels of a large number of genes. The large amount of data produced by a microarray experiment requires considerable computational analysis before some biologically meaningful hypothesis can be drawn. In contrast to a single time-point microarray experiment, the temporal microarray experiments enable us to understand the dynamics of the underlying system. Such information, if properly utilized, can provide vital clues about the structure and functioning of the system under study. This dissertation introduces some new computational techniques to process temporal microarray data. We focus on three broad stages of microarray data analysis - normalization, clustering and inference of gene-regulatory networks. We explain our methods using various synthesized datasets and a real biological dataset, produced in-house, to monitor the leaf senescence process in Arabidopsis thaliana

    Mining Time-delayed Gene Regulation Patterns from Gene Expression Data

    Get PDF
    Discovered gene regulation networks are very helpful to predict unknown gene functions. The activating and deactivating relations between genes and genes are mined from microarray gene expression data. There are evidences showing that multiple time units delay exist in a gene regulation process. Association rule mining technique is very suitable for finding regulation relations among genes. However, current association rule mining techniques cannot handle temporally ordered transactions. We propose a modified association rule mining technique for efficiently discovering time-delayed regulation relationships among genes.By analyzing gene expression data, we can discover gene relations. Thus, we use modified association rule to mine gene regulation patterns. Our proposed method, BC3, is designed to mine time-delayed gene regulation patterns with length 3 from time series gene expression data. However, the front two items are regulators, and the last item is their affecting target. First we use Apriori to find frequent 2-itemset in order to figure backward to BL1. The Apriori mined the frequent 2-itemset in the same time point, so we make the L2 split to length one for having relation in the same time point. Then we combine BL1 with L1 to a new ordered-set BC2 with time-delayed relations. After pruning BC2 with the threshold, BL2 is derived. The results are worked out by BL2 joining itself to BC3, and sifting BL3 from BC3. We use yeast gene expression data to evaluate our method and analyze the results to show our work is efficient

    Conserved temporal ordering of promoter activation implicates common mechanisms governing the immediate early response across cell types and stimuli

    Get PDF
    Conserved temporal precedence between IEGs (light blue nodes) and other protein-coding genes (green nodes) is shown by directed edges. Genes annotated with the GO term 'response to endoplasmic reticulum stress' (GO:003497) have a red rectangle around the gene name; red squares indicate genes with CAGE clusters enriched for XBP1 transcription factor binding sites

    AP-1 imprints a reversible transcriptional program of senescent cells

    Get PDF
    Senescent cells affect many physiological and pathophysiological processes. While select genetic and epigenetic elements for senescence induction have been identified, the dynamics, epigenetic mechanisms and regulatory networks defining senescence competence, induction and maintenance remain poorly understood, precluding the deliberate therapeutic targeting of senescence for health benefits. Here, we examined the possibility that the epigenetic state of enhancers determines senescent cell fate. We explored this by generating time-resolved transcriptomes and epigenome profiles during oncogenic RAS-induced senescence and validating central findings in different cell biology and disease models of senescence. Through integrative analysis and functional validation, we reveal links between enhancer chromatin, transcription factor recruitment and senescence competence. We demonstrate that activator protein 1 (AP-1) ‘pioneers’ the senescence enhancer landscape and defines the organizational principles of the transcription factor network that drives the transcriptional programme of senescent cells. Together, our findings enabled us to manipulate the senescence phenotype with potential therapeutic implications
    • …
    corecore