10,315 research outputs found

    Spectral estimation in unevenly sampled space of periodically expressed microarray time series data

    Get PDF
    BACKGROUND: Periodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series. RESULTS: For evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes. CONCLUSION: We have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling

    Peridocity, Change Detection and Prediction in Microarrays

    Get PDF
    Three topics in the analysis of microarray genomic data are discussed and improved statistical methods are developed in each case. A statistical test with higher power is developed for detecting periodicity in microarray time series data. Periodicity in short series, with non-Fourier frequencies, is detected through a Pearson curve calibrated to the null distribution obtained by computer simulation. Unlike other traditional methods, this approach is applicable even in the presence of missing values or unequal time intervals. The usefulness of the new method is demonstrated on simulated series as well as actual microarray time series. The second topic develops a new method for detection of changes in DNA or gene copy number. Regions for DNA copy number aberrations in chromosomal material are detected using maximum overlapping discrete wavelet transform (MODWT). It is shown how repeated application of MODWT to a series can be used to confirm the presence of change points. Application to simulated as well as array CGH (Comparative Genomic Hybridization) data confirms the excellent performance of this method. In the third topic, it is shown that an improved class predictor for tissue samples in microarray experiments is developed by incorporating nearest neighbour covariates (NNC). It is demonstrated that this method reduces the mis-classification errors in both simulated and actual microarray data

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

    Whole genome sequencing for mutation discovery in a single case of lysosomal storage disease (MPS type 1) in the dog.

    Get PDF
    Mucopolysaccharidosis (MPS) is a metabolic storage disorder caused by the deficiency of any lysosomal enzyme required for the breakdown of glycosaminoglycans. A 15-month-old Boston Terrier presented with clinical signs consistent with lysosomal storage disease including corneal opacities, multifocal central nervous system disease and progressively worsening clinical course. Diagnosis was confirmed at necropsy based on histopathologic evaluation of multiple organs demonstrating accumulation of mucopolysaccharides. Whole genome sequencing was used to uncover a frame-shift insertion affecting the alpha-L-iduronidase (IDUA) gene (c.19_20insCGGCCCCC), a mutation confirmed in another Boston Terrier presented 2 years later with a similar clinical picture. Both dogs were homozygous for the IDUA mutation and shared coat colors not recognized as normal for the breed by the American Kennel Club. In contrast, the mutation was not detected in 120 unrelated Boston Terriers as well as 202 dogs from other breeds. Recent inbreeding to select for recessive and unusual coat colors may have concentrated this relatively rare allele in the breed. The identification of the variant enables ante-mortem diagnosis of similar cases and selective breeding to avoid the spread of this disease in the breed. Boston Terriers carrying this variant represent a promising model for MPS I with neurological abnormalities in humans

    Robust detection of periodic time series measured from biological systems

    Get PDF
    BACKGROUND: Periodic phenomena are widespread in biology. The problem of finding periodicity in biological time series can be viewed as a multiple hypothesis testing of the spectral content of a given time series. The exact noise characteristics are unknown in many bioinformatics applications. Furthermore, the observed time series can exhibit other non-idealities, such as outliers, short length and distortion from the original wave form. Hence, the computational methods should preferably be robust against such anomalies in the data. RESULTS: We propose a general-purpose robust testing procedure for finding periodic sequences in multiple time series data. The proposed method is based on a robust spectral estimator which is incorporated into the hypothesis testing framework using a so-called g-statistic together with correction for multiple testing. This results in a robust testing procedure which is insensitive to heavy contamination of outliers, missing-values, short time series, nonlinear distortions, and is completely insensitive to any monotone nonlinear distortions. The performance of the methods is evaluated by performing extensive simulations. In addition, we compare the proposed method with another recent statistical signal detection estimator that uses Fisher's test, based on the Gaussian noise assumption. The results demonstrate that the proposed robust method provides remarkably better robustness properties. Moreover, the performance of the proposed method is preferable also in the standard Gaussian case. We validate the performance of the proposed method on real data on which the method performs very favorably. CONCLUSION: As the time series measured from biological systems are usually short and prone to contain different kinds of non-idealities, we are very optimistic about the multitude of possible applications for our proposed robust statistical periodicity detection method. AVAILABILITY: The presented methods have been implemented in Matlab and in R. Codes are available on request. Supplementary material is available at:

    Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In practice many biological time series measurements, including gene microarrays, are conducted at time points that seem to be interesting in the biologist's opinion and not necessarily at fixed time intervals. In many circumstances we are interested in finding targets that are expressed periodically. To tackle the problems of uneven sampling and unknown type of noise in periodicity detection, we propose to use robust regression.</p> <p>Methods</p> <p>The aim of this paper is to develop a general framework for robust periodicity detection and review and rank different approaches by means of simulations. We also show the results for some real measurement data.</p> <p>Results</p> <p>The simulation results clearly show that when the sampling of time series gets more and more uneven, the methods that assume even sampling become unusable. We find that M-estimation provides a good compromise between robustness and computational efficiency.</p> <p>Conclusion</p> <p>Since uneven sampling occurs often in biological measurements, the robust methods developed in this paper are expected to have many uses. The regression based formulation of the periodicity detection problem easily adapts to non-uniform sampling. Using robust regression helps to reject inconsistently behaving data points.</p> <p>Availability</p> <p>The implementations are currently available for Matlab and will be made available for the users of R as well. More information can be found in the web-supplement <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

    Comparison of Pattern Detection Methods in Microarray Time Series of the Segmentation Clock

    Get PDF
    While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns
    corecore