903 research outputs found

    Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

    Full text link
    The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis--Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10--15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Spectral estimation in unevenly sampled space of periodically expressed microarray time series data

    Get PDF
    BACKGROUND: Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series. RESULTS: For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes. CONCLUSION: We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling

    Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In practice many biological time series measurements, including gene microarrays, are conducted at time points that seem to be interesting in the biologist's opinion and not necessarily at fixed time intervals. In many circumstances we are interested in finding targets that are expressed periodically. To tackle the problems of uneven sampling and unknown type of noise in periodicity detection, we propose to use robust regression.</p> <p>Methods</p> <p>The aim of this paper is to develop a general framework for robust periodicity detection and review and rank different approaches by means of simulations. We also show the results for some real measurement data.</p> <p>Results</p> <p>The simulation results clearly show that when the sampling of time series gets more and more uneven, the methods that assume even sampling become unusable. We find that M-estimation provides a good compromise between robustness and computational efficiency.</p> <p>Conclusion</p> <p>Since uneven sampling occurs often in biological measurements, the robust methods developed in this paper are expected to have many uses. The regression based formulation of the periodicity detection problem easily adapts to non-uniform sampling. Using robust regression helps to reject inconsistently behaving data points.</p> <p>Availability</p> <p>The implementations are currently available for Matlab and will be made available for the users of R as well. More information can be found in the web-supplement <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

    Transcriptome Phase Distribution Analysis Reveals Diurnal Regulated Biological Processes and Key Pathways in Rice Flag Leaves and Seedling Leaves

    Get PDF
    Plant diurnal oscillation is a 24-hour period based variation. The correlation between diurnal genes and biological pathways was widely revealed by microarray analysis in different species. Rice (Oryza sativa) is the major food staple for about half of the world's population. The rice flag leaf is essential in providing photosynthates to the grain filling. However, there is still no comprehensive view about the diurnal transcriptome for rice leaves. In this study, we applied rice microarray to monitor the rhythmically expressed genes in rice seedling and flag leaves. We developed a new computational analysis approach and identified 6,266 (10.96%) diurnal probe sets in seedling leaves, 13,773 (24.08%) diurnal probe sets in flag leaves. About 65% of overall transcription factors were identified as flag leaf preferred. In seedling leaves, the peak of phase distribution was from 2:00am to 4:00am, whereas in flag leaves, the peak was from 8:00pm to 2:00am. The diurnal phase distribution analysis of gene ontology (GO) and cis-element enrichment indicated that, some important processes were waken by the light, such as photosynthesis and abiotic stimulus, while some genes related to the nuclear and ribosome involved processes were active mostly during the switch time of light to dark. The starch and sucrose metabolism pathway genes also showed diurnal phase. We conducted comparison analysis between Arabidopsis and rice leaf transcriptome throughout the diurnal cycle. In summary, our analysis approach is feasible for relatively unbiased identification of diurnal transcripts, efficiently detecting some special periodic patterns with non-sinusoidal periodic patterns. Compared to the rice flag leaves, the gene transcription levels of seedling leaves were relatively limited to the diurnal rhythm. Our comprehensive microarray analysis of seedling and flag leaves of rice provided an overview of the rice diurnal transcriptome and indicated some diurnal regulated biological processes and key functional pathways in rice

    Are we overestimating the number of cell-cycling genes? The impact of background models for time series data.

    Get PDF
    Periodic processes play fundamental roles in organisms. Prominent examples are the cell cycle and the circadian clock. Microarray array technology has enabled us to screen complete sets of transcripts for possible association with such fundamental periodic processes on a system-wide level. Frequently, quite a large number of genes has been detected as periodically expressed. However, the small overlap of identified genes between different studies has shaded considerable doubts about the reliability of the detected periodic expression. In this study, we show that a major reason for the lacking agreement is the use of an inadequate background model for the determination of significance. We demonstrate that the choice of background model has considerable impact on the statistical significance of periodic expression. For illustration, we reanalyzed two microarray studies of the yeast cell cycle. Our evaluation strongly indicates that the results of previous analyses might have been overoptimistic and that the use of more suitable background model promises to give more realistic resultsinfo:eu-repo/semantics/publishedVersio

    Robust detection of periodic time series measured from biological systems

    Get PDF
    BACKGROUND: Periodic phenomena are widespread in biology. The problem of finding periodicity in biological time series can be viewed as a multiple hypothesis testing of the spectral content of a given time series. The exact noise characteristics are unknown in many bioinformatics applications. Furthermore, the observed time series can exhibit other non-idealities, such as outliers, short length and distortion from the original wave form. Hence, the computational methods should preferably be robust against such anomalies in the data. RESULTS: We propose a general-purpose robust testing procedure for finding periodic sequences in multiple time series data. The proposed method is based on a robust spectral estimator which is incorporated into the hypothesis testing framework using a so-called g-statistic together with correction for multiple testing. This results in a robust testing procedure which is insensitive to heavy contamination of outliers, missing-values, short time series, nonlinear distortions, and is completely insensitive to any monotone nonlinear distortions. The performance of the methods is evaluated by performing extensive simulations. In addition, we compare the proposed method with another recent statistical signal detection estimator that uses Fisher's test, based on the Gaussian noise assumption. The results demonstrate that the proposed robust method provides remarkably better robustness properties. Moreover, the performance of the proposed method is preferable also in the standard Gaussian case. We validate the performance of the proposed method on real data on which the method performs very favorably. CONCLUSION: As the time series measured from biological systems are usually short and prone to contain different kinds of non-idealities, we are very optimistic about the multitude of possible applications for our proposed robust statistical periodicity detection method. AVAILABILITY: The presented methods have been implemented in Matlab and in R. Codes are available on request. Supplementary material is available at:

    Robust discovery of periodically expressed genes using the laplace periodogram

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Time-course gene expression analysis has become important in recent developments due to the increasingly available experimental data. The detection of genes that are periodically expressed is an important step which allows us to study the regulatory mechanisms associated with the cell cycle.</p> <p>Results</p> <p>In this work, we present the Laplace periodogram which employs the least absolute deviation criterion to provide a more robust detection of periodic gene expression in the presence of outliers. The Laplace periodogram is shown to perform comparably to existing methods for the <it>Sacharomyces cerevisiae</it> and <it>Arabidopsis</it> time-course datasets, and to outperform existing methods when outliers are present.</p> <p>Conclusion</p> <p>Time-course gene expression data are often noisy due to the limitations of current technology, and may include outliers. These artifacts corrupt the available data and make the detection of periodicity difficult in many cases. The Laplace periodogram is shown to perform well for both data with and without the presence of outliers, and also for data that are non-uniformly sampled.</p
    corecore