53,433 research outputs found

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

    A Platform for Processing Expression of Short Time Series (PESTS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data.</p> <p>Results</p> <p>To this end, we introduce a platform for Processing Expression of Short Time Series (PESTS). It was designed with a focus on usability and interpretability of analyses for the researcher. As such, it implements several standard techniques for comparability as well as visualization functions. However, it is designed specifically for the unique methods we have developed for significance analysis, multiple test correction and clustering of short time series data. The central tenet of these methods is the use of biologically relevant features for analysis. Features summarize short gene expression profiles, inherently incorporate dependence across time, and allow for both full description of the examined curve and missing data points.</p> <p>Conclusions</p> <p>PESTS is fully generalizable to other types of time series analyses. PESTS implements novel methods as well as several standard techniques for comparability and visualization functions. These features and functionality make PESTS a valuable resource for a researcher's toolkit. PESTS is available to download for free to academic and non-profit users at <url>http://www.mailman.columbia.edu/academic-departments/biostatistics/research-service/software-development</url>.</p

    Animated interval scatter-plot views for the exploratory analysis of large scale microarray time-course data.

    Get PDF
    Microarray technologies are a relatively new development that allow biologists to monitor the activity of thousands of genes (normally around 8,000) in parallel across multiple stages of a biological process. While this new perspective on biological functioning is recognised as having the potential to have a significant impact on the diagnosis, treatment, and prevention of diseases, it is only through effective analysis of the data produced that biologists can begin to unlock this potential. A significant obstacle to achieving effective analysis of microarray time-course is the combined scale and complexity of the data. This inevitably makes it difficult to reveal certain significant patterns in the data. In particular, it is less dominant patterns and, specifically, patterns that occur over smaller intervals of an experiment's overall time-frame that are more difficult to find. While existing techniques are capable of finding either unexpected patterns of activity over the majority of an experiment's time-frame or expected patterns of activity over smaller intervals of the time-frame, there are no techniques, or combination of techniques, that are suitable for finding unsuspected patterns of activity over smaller intervals. In order to overcome this limitation we have developed the Time-series Explorer, which specifically supports biologists in their attempts to reveal these types of pattern by allowing them to control an animated interval scatter-plot view of their data. This paper discusses aspects of the technique that make such an animated overview viable and describes the results of a user evaluation assessing the practical utility of the technique within the wider context of microarray time-series analysis as a whole

    Dynamic Analysis of High Dimensional Microarray Time Series Data Using Various Dimensional Reduction Methods

    Get PDF
    This dissertation focuses on dynamic analysis of reduced dimension models of two microarray time series datasets. Underlying research achieves two main objectives; namely, (1) various dimension reduction techniques used on time series microarray data, and (2) estimating autoregressive coefficients using several penalized regression methods like ridge, SCAD, and lasso.The research methodology includes two research tasks. Firstly, applying several dimension reduction methods on two microarray data sets, and modeling comparisons based on accuracy and computation cost. Secondly, applying the sparse vector autoregressive (SVAR) model to estimate gene regulatory network based on gene expression profile from time series microarray experiment on two datasets and the autoregressive coefficients estimation were calculated using several penalized regression methods, and then performing comparisons among various regression methods for each dimension reduction model.Study results show that the dimension reduction methods producing orthogonal independent variables are performing better because orthogonality leads to reasonable coefficient estimation with low standard errors. On the other hand, regarding dynamic analysis, it could be seen that factor analysis (FA) outperformed the rest of dimension reduction methods with regards to goodness of fit after applying several penalized regression methods on each model. The reason behind this is due to using varimax rotation in FA, in which most of the coordinates are set closer to zero, and in turn makes the data sparser. Hence inducing additional sparsity subject to maintaining a certain goodness of fit.Industrial Engineering & Managemen

    RNA-seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering

    Get PDF
    With the fast development of high-throughput sequencing technologies, a new generation of genome-wide gene expression measurements is under way. This is based on mRNA sequencing (RNA-seq), which complements the already mature technology of microarrays, and is expected to overcome some of the latterā€™s disadvantages. These RNA-seq data pose new challenges, however, as strengths and weaknesses have yet to be fully identified. Ideally, Next (or Second) Generation Sequencing measures can be integrated for more comprehensive gene expression investigation to facilitate analysis of whole regulatory networks. At present, however, the nature of these data is not very well understood. In this paper we study three alternative gene expression time series datasets for the Drosophila melanogaster embryo development, in order to compare three measurement techniques: RNA-seq, single-channel and dual-channel microarrays. The aim is to study the state of the art for the three technologies, with a view of assessing overlapping features, data compatibility and integration potential, in the context of time series measurements. This involves using established tools for each of the three different technologies, and technical and biological replicates (for RNA-seq and microarrays, respectively), due to the limited availability of biological RNA-seq replicates for time series data. The approach consists of a sensitivity analysis for differential expression and clustering. In general, the RNA-seq dataset displayed highest sensitivity to differential expression. The single-channel data performed similarly for the differentially expressed genes common to gene sets considered. Cluster analysis was used to identify different features of the gene space for the three datasets, with higher similarities found for the RNA-seq and single-channel microarray dataset

    Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

    Full text link
    The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis--Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10--15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Time-series Explorer: An Animated Information Visualisation for Microarray Time-course Data

    Get PDF
    Microarray technologies are a relatively new development that allow biologists to monitor the activity of thousands of genes (normally around 8,000) in parallel across multiple stages of a biological process. While this new perspective on biological functioning is recognised as having the potential to have a significant impact on the diagnosis, treatment, and prevention of diseases, it is only through effective analysis of the data produced that biologists can begin to unlock this potential. A significant obstacle to achieving effective analysis of microarray time-course is the combined scale and complexity of the data. This inevitably makes it difficult to reveal certain significant patterns in the data. In particular it is less dominant patterns and, specifically, patterns that occur over smaller intervals of an experiment's overall time-frame that are more difficult to find. While existing techniques are capable of finding either unexpected patterns of activity over the majority of an experiment's time frame or expected patterns of activity over smaller intervals of the time frame, there are no techniques, or combination of techniques, that are suitable for finding unsuspected patterns of activity over smaller intervals. In order to overcome this limitation we have developed the Time-series Explorer, which specifically supports biologists in their attempts to reveal these types of pattern by allowing them to visualise their data controlling an animated interval scatter-plot linked to two complementary graph views. An evaluation, involving biologists working with real data, tested the extent of the tools desired functionality and assessed the technique's practical utility within the wider context of microarray time-course analysis. This proved the technique not only capable of revealing previously unsuspected temporal patterns but also, in certain cases, more appropriate for finding previously suspected patterns and patterns that occurred over the majority of the time-frame

    Towards knowledge-based gene expression data mining

    Get PDF
    The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. In this review, we report on the plethora of gene expression data mining techniques and focus on their evolution toward knowledge-based data analysis approaches. In particular, we discuss recent developments in gene expression-based analysis methods used in association and classification studies, phenotyping and reverse engineering of gene networks
    • ā€¦
    corecore