72 research outputs found
M-quantile regression analysis of temporal gene expression data
In this paper, we explore the use of M-regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal expression is significantly different across a number of biological conditions. We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene
expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we employ a Hotelling T2-test to detect significant differences of the temporal M-quantile profiles across conditions. Simulated data shows the increased robustness of M-quantile regression methods over standard regression methods. We conclude by using the method to detect differentially expressed genes from time-course microarray data on muscular dystrophy
Temporal Bayesian classifiers for modelling muscular dystrophy expression data
The analysis of microarray data from time-series experiments requires specialised algorithms, which take the temporal ordering of the data into account. In this paper we explore a new architecture of Bayesian classifier that can be used to understand how biological mechanisms differ with respect to time. We show that this classifier improves the classification of microarray data and at the same time ensures that the models can easily be analysed by biologists by incorporating time transparently. In this paper we focus on data that has been generated to explore different types of muscular dystrophy
A Spatio-Temporal Bayesian Network Classifier for Understanding Visual Field Deterioration
Progressive loss of the field of vision is characteristic of a number of eye diseases
such as glaucoma which is a leading cause of irreversible blindness in the world. Recently,
there has been an explosion in the amount of data being stored on patients who suffer from visual deterioration including field test data, retinal image data and patient demographic data. However, there has been relatively little work in modelling
the spatial and temporal relationships common to such data. In this paper we introduce a novel method for classifying Visual Field (VF) data that explicitly models these spatial and temporal relationships. We carry out an analysis of this
method and compare it to a number of classifiers from the machine learning and statistical communities. Results are very encouraging showing that our classifiers are comparable to existing statistical models whilst also facilitating the understanding of underlying spatial and temporal relationships within VF data. The results
reveal the potential of using such models for knowledge discovery within ophthalmic databases, such as networks reflecting the ‘nasal step’, an early indicator of the onset of glaucoma. The results outlined in this paper pave the way for a substantial program of study involving many other spatial and temporal datasets, including retinal image and clinical data
The robust selection of predictive genes via a simple classifier
Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works.
This in turn may lead to better diagnoses and potentially can lead to a cure for that disease. This task becomes extremely challenging when the
data are characterised by only a small number of samples and a high number of dimensions, as it is often the case with gene expression data.
Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for the robust
identification of the most predictive features in such data. Within this framework, we propose a simple selective na¨ıve Bayes classifier discovered using a global search technique, and combine it with data perturbation to
increase its robustness to small sample sizes.
An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all
confounded by small sample sizes and high dimensionality. The method has been shown capable of identifying genes previously confirmed or associated with prostate cancer and viral infections
An extended Kalman filtering approach to modeling nonlinear dynamic gene regulatory networks via short gene expression time series
Copyright [2009] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.In this paper, the extended Kalman filter (EKF) algorithm is applied to model the gene regulatory network from gene time series data. The gene regulatory network is considered as a nonlinear dynamic stochastic model that consists of the gene measurement equation and the gene regulation equation. After specifying the model structure, we apply the EKF algorithm for identifying both the model parameters and the actual value of gene expression levels. It is shown that the EKF algorithm is an online estimation algorithm that can identify a large number of parameters (including parameters of nonlinear functions) through iterative procedure by using a small number of observations. Four real-world gene expression data sets are employed to demonstrate the effectiveness of the EKF algorithm, and the obtained models are evaluated from the viewpoint of bioinformatics
Exploiting the full power of temporal gene expression profiling through a new statistical test: Application to the analysis of muscular dystrophy data
Background: The identification of biologically interesting genes in a temporal expression profiling
dataset is challenging and complicated by high levels of experimental noise. Most statistical methods
used in the literature do not fully exploit the temporal ordering in the dataset and are not suited
to the case where temporal profiles are measured for a number of different biological conditions.
We present a statistical test that makes explicit use of the temporal order in the data by fitting
polynomial functions to the temporal profile of each gene and for each biological condition. A
Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials
are significantly different from each other.
Results: We validate the temporal Hotelling T2-test on muscular gene expression data from four
mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gammasarcoglycan
deficient mice, and wild-type mice. The first three are animal models for different
muscular dystrophies. Extensive biological validation shows that the method is capable of finding
genes with temporal profiles significantly different across the four strains, as well as identifying
potential biomarkers for each form of the disease. The added value of the temporal test compared
to an identical test which does not make use of temporal ordering is demonstrated via a simulation
study, and through confirmation of the expression profiles from selected genes by quantitative PCR
experiments. The proposed method maximises the detection of the biologically interesting genes,
whilst minimising false detections.
Conclusion: The temporal Hotelling T2-test is capable of finding relatively small and robust sets
of genes that display different temporal profiles between the conditions of interest. The test is
simple, it can be used on gene expression data generated from any experimental design and for any
number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R
code is available from V.V. The microarray data have been submitted to GEO under series
GSE1574 and GSE3523
Consensus clustering and functional interpretation of gene-expression data
Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas
Penalised inference for autoregressive moving average models with time-dependent predictors
Linear models that contain a time-dependent response and explanatory variables have attracted much interest in recent years. The most general form of the existing approaches is of a linear regression model with autoregressive moving average residuals. The addition of the moving average component results in a complex model with a very challenging implementation. In this paper, we propose to account for the time dependency in the data by explicitly adding autoregressive terms of the response variable in the linear model. In addition, we consider an autoregressive process for the errors in order to capture complex dynamic relationships parsimoniously. To broaden the application of the model, we present an penalized likelihood approach for the estimation of the parameters and show how the adaptive lasso penalties lead to an estimator which enjoys the oracle property. Furthermore, we prove the consistency of the estimators with respect to the mean squared prediction error in high-dimensional settings, an aspect that has not been considered by the existing time-dependent regression models. A simulation study and real data analysis show the successful applications of the model on financial data on stock indexes
Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks
Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order-some entries of the precision matrix are a priori zeros-or equal dependency strengths across time lags-some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l1-penalized maximum likelihood, imposing a further constraint on the absolute value of its entries, which results in sparse networks. Selecting the optimal sparsity level is a major challenge for this type of approaches. In this paper, we evaluate the performance of a number of model selection criteria for fGGMs by means of two simulated regulatory networks from realistic biological processes. The analysis reveals a good performance of fGGMs in comparison with other methods for inferring dynamic networks and of the KLCV criterion in particular for model selection. Finally, we present an application on a high-resolution time-course microarray data from the Neisseria meningitidis bacterium, a causative agent of life-threatening infections such as meningitis. The methodology described in this paper is implemented in the R package sglasso, freely available at CRAN, http://CRAN.R-project.org/package=sglasso
- …