8 research outputs found

    A test for the absence of aliasing or white noise in locally stationary wavelet time series

    Get PDF
    Aliasing is often overlooked in time series analysis but can seriously distort the spectrum, autocovariance and their estimates. We show that dyadic subsampling of a locally stationary wavelet process, which can cause aliasing, results in a process that is the sum of asymptotic white noise and another locally stationary wavelet process with a modified spectrum. We develop a test for the absence of aliasing in a locally stationary wavelet series at a fixed location, and illustrate it on simulated data and a wind energy time series. A useful by-product is a new test for local white noise. The tests are robust to model misspecification in that it is unnecessary for the analysis and synthesis wavelets to be identical. Hence, in principle, the tests work irrespective of which wavelet is used to analyze the time series, though in practice there is a tradeoff between increasing statistical power and time localization of the test

    Dynamic stochastic block models:Parameter estimation and detection of changes in community structure

    Get PDF
    The stochastic block model (SBM) is widely used for modelling network data by assigning individuals (nodes) to communities (blocks) with the probability of an edge existing between individuals depending upon community membership. In this paper we introduce an autoregressive extension of the SBM. This is based on continuous time Markovian edge dynamics. The model is appropriate for networks evolving over time and allows for edges to turn on and off. Moreover, we allow for the movement of individuals between communities. An effective reversible jump Markov chain Monte Carlo algorithm is introduced for sampling jointly from the posterior distribution of the community parameters and the number and location of changes in community membership. The algorithm is successfully applied to a network of mice

    Minimum spectral connectivity projection pursuit:Divisive clustering using optimal projections for spectral clustering

    Get PDF
    We study the problem of determining the optimal low-dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering. Applying the proposed approach for a decreasing sequence of scaling parameters allows us to obtain large margin clustering solutions, which are found to be competitive with those from dedicated maximum margin clustering algorithms

    A linear time method for the detection of point and collective anomalies

    Get PDF
    The challenge of efficiently identifying anomalies in data sequences is an important statistical problem that now arises in many applications. Whilst there has been substantial work aimed at making statistical analyses robust to outliers, or point anomalies, there has been much less work on detecting anomalous segments, or collective anomalies. By bringing together ideas from changepoint detection and robust statistics, we introduce Collective And Point Anomalies (CAPA), a computationally efficient approach that is suitable when collective anomalies are characterised by either a change in mean, variance, or both, and distinguishes them from point anomalies. Theoretical results establish the consistency of CAPA at detecting collective anomalies and empirical results show that CAPA has close to linear computational cost as well as being more accurate at detecting and locating collective anomalies than other approaches. We demonstrate the utility of CAPA through its ability to detect exoplanets from light curve data from the Kepler telescope

    Multivariate Locally Stationary Wavelet Analysis with the mvLSW R Package

    Get PDF
    This paper describes the R package mvLSW. The package contains a suite of tools for the analysis of multivariate locally stationary wavelet (LSW) time series. Key elements include: (i) the synthesis of multivariate LSW time series for a given multivariate evolutionary wavelet spectrum (EWS); (ii) estimation of the time-dependent multivariate EWS for a given time series; (iii) estimation of the time-dependent coherence and partial coherence between time series channels; and, (iv) estimation of confidence intervals for the multivariate EWS estimation. A demonstration of the package is presented via both a simulated example and a case study using the EuStockMarkets data from the R data repository

    Most recent changepoint detection in Panel data

    Get PDF
    Detecting recent changepoints in time-series can be important for short-term prediction, as we can then base predictions just on the data since the changepoint. In many applications we have panel data, consisting of many related univariate time-series. We present a novel approach to detect sets of most recent changepoints in such panel data which aims to pool information across time-series, so that we preferentially infer a most recent change at the same time-point in multiple series. Our approach is computationally efficient as it involves analysing each time-series independently to obtain a profile likelihood like quantity that summarises the evidence for the series having either no change or a specific value for its most recent changepoint. We then post-process this output from each time-series to obtain a potentially small set of times for the most recent changepoints, and, for each time, the set of series which has their most recent changepoint at that time. We demonstrate the usefulness of this method on two data sets: forecasting events in a telecommunications network and inference about changes in the net asset ratio for a panel of US firms

    Subspace Clustering of Very Sparse High-Dimensional Data

    No full text
    In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms