7 research outputs found

    CSI : A nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data

    Get PDF
    How an organism responds to the environmental challenges it faces is heavily influenced by its gene regulatory networks (GRNs). Whilst most methods for inferring GRNs from time series mRNA expression data are only able to cope with single time series (or single perturbations with biological replicates), it is becoming increasingly common for several time series to be generated under different experimental conditions. The CSI algorithm (Klemm, 2008) represents one approach to inferring GRNs from multiple time series data, which has previously been shown to perform well on a variety of datasets (Penfold and Wild, 2011). Another challenge in network inference is the identification of condition specific GRNs i.e., identifying how a GRN is rewired under different conditions or different individuals. The Hierarchical Causal Structure Identification (HCSI) algorithm (Penfold et al., 2012) is one approach that allows inference of condition specific networks (Hickman et al., 2013), that has been shown to be more accurate at reconstructing known networks than inference on the individual datasets alone. Here we describe a MATLAB implementation of CSI/HCSI that includes fast approximate solutions to CSI as well as Markov Chain Monte Carlo implementations of both CSI and HCSI, together with a user-friendly GUI, with the intention of making the analysis of networks from multiple perturbed time series datasets more accessible to the wider community.1 The GUI itself guides the user through each stage of the analysis, from loading in the data, to parameter selection and visualisation of networks, and can be launched by typing >> csi into the MATLAB command line. For each step of the analysis, links to documentation and tutorials are available within the GUI, which includes documentation on visualisation and interacting with output file

    CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data

    Get PDF
    How an organism responds to the environmental challenges it faces is heavily influenced by its gene regulatory networks (GRNs). Whilst most methods for inferring GRNs from time series mRNA expression data are only able to cope with single time series (or single perturbations with biological replicates), it is becoming increasingly common for several time series to be generated under different experimental conditions. The CSI algorithm (Klemm, 2008) represents one approach to inferring GRNs from multiple time series data, which has previously been shown to perform well on a variety of datasets (Penfold and Wild, 2011). Another challenge in network inference is the identification of condition specific GRNs i.e., identifying how a GRN is rewired under different conditions or different individuals. The Hierarchical Causal Structure Identification (HCSI) algorithm (Penfold et al., 2012) is one approach that allows inference of condition specific networks (Hickman et al., 2013), that has been shown to be more accurate at reconstructing known networks than inference on the individual datasets alone. Here we describe a MATLAB implementation of CSI/HCSI that includes fast approximate solutions to CSI as well as Markov Chain Monte Carlo implementations of both CSI and HCSI, together with a user-friendly GUI, with the intention of making the analysis of networks from multiple perturbed time series datasets more accessible to the wider community.1 The GUI itself guides the user through each stage of the analysis, from loading in the data, to parameter selection and visualisation of networks, and can be launched by typing >> csi into the MATLAB command line. For each step of the analysis, links to documentation and tutorials are available within the GUI, which includes documentation on visualisation and interacting with output file

    Elastic Similarity Measures for Multivariate Time Series Classification

    No full text
    Elastic similarity measures are a class of similarity measures specifically designed to work with time series data. When scoring the similarity between two time series, they allow points that do not correspond in timestamps to be aligned. This can compensate for misalignments in the time axis of time series data, and for similar processes that proceed at variable and differing paces. Elastic similarity measures are widely used in machine learning tasks such as classification, clustering and outlier detection when using time series data. There is a multitude of research on various univariate elastic similarity measures. However, except for multivariate versions of the well known Dynamic Time Warping (DTW) there is a lack of work to generalise other similarity measures for multivariate cases. This paper adapts two existing strategies used in multivariate DTW, namely, Independent and Dependent DTW, to several commonly used elastic similarity measures. Using 23 datasets from the University of East Anglia (UEA) multivariate archive, for nearest neighbour classification, we demonstrate that each measure outperforms all others on at least one dataset and that there are datasets for which either the dependent versions of all measures are more accurate than their independent counterparts or vice versa. This latter finding suggests that these differences arise from a fundamental property of the data. We also show that an ensemble of such nearest neighbour classifiers is highly competitive with other state-of-the-art multivariate time series classifiers

    Elastic Similarity Measures for Multivariate Time Series Classification

    No full text
    Elastic similarity measures are a class of similarity measures specifically designed to work with time series data. When scoring the similarity between two time series, they allow points that do not correspond in timestamps to be aligned. This can compensate for misalignments in the time axis of time series data, and for similar processes that proceed at variable and differing paces. Elastic similarity measures are widely used in machine learning tasks such as classification, clustering and outlier detection when using time series data. There is a multitude of research on various univariate elastic similarity measures. However, except for multivariate versions of the well known Dynamic Time Warping (DTW) there is a lack of work to generalise other similarity measures for multivariate cases. This paper adapts two existing strategies used in multivariate DTW, namely, Independent and Dependent DTW, to several commonly used elastic similarity measures. Using 23 datasets from the University of East Anglia (UEA) multivariate archive, for nearest neighbour classification, we demonstrate that each measure outperforms all others on at least one dataset and that there are datasets for which either the dependent versions of all measures are more accurate than their independent counterparts or vice versa. This latter finding suggests that these differences arise from a fundamental property of the data. We also show that an ensemble of such nearest neighbour classifiers is highly competitive with other state-of-the-art multivariate time series classifiers

    Proximity forest : an effective and scalable distance-based classifier for time series

    No full text
    Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10 thousand time series only; which may explain why the primary research focus has been in creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state of the art models Elastic Ensemble and COTE.Comment: 30 pages, 12 figure
    corecore