415 research outputs found

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A statistical model of internet traffic.

    Get PDF
    PhDWe present a method to extract a time series (Number of Active Requests (NAR)) from web cache logs which serves as a transport level measurement of internet traffic. This series also reflects the performance or Quality of Service of a web cache. Using time series modelling, we interpret the properties of this kind of internet traffic and its effect on the performance perceived by the cache user. Our preliminary analysis of NAR concludes that this dataset is suggestive of a long-memory self-similar process but is not heavy-tailed. Having carried out more in-depth analysis, we propose a three stage modelling process of the time series: (i) a power transformation to normalise the data, (ii) a polynomial fit to approximate the general trend and (iii) a modelling of the residuals from the polynomial fit. We analyse the polynomial and show that the residual dataset may be modelled as a FARIMA(p, d, q) process. Finally, we use Canonical Variate Analysis to determine the most significant defining properties of our measurements and draw conclusions to categorise the differences in traffic properties between the various caches studied. We show that the strongest illustration of differences between the caches is shown by the short memory parameters of the FARIMA fit. We compare the differences revealed between our studied caches and draw conclusions on them. Several programs have been written in Perl and S programming languages for this analysis including totalqd.pl for NAR calculation, fullanalysis for general statistical analysis of the data and armamodel for FARIMA modelling

    Health monitoring of civil infrastructures by subspace system identification method: an overview

    Get PDF
    Structural health monitoring (SHM) is the main contributor of the future's smart city to deal with the need for safety, lower maintenance costs, and reliable condition assessment of structures. Among the algorithms used for SHM to identify the system parameters of structures, subspace system identification (SSI) is a reliable method in the time-domain that takes advantages of using extended observability matrices. Considerable numbers of studies have specifically concentrated on practical applications of SSI in recent years. To the best of author's knowledge, no study has been undertaken to review and investigate the application of SSI in the monitoring of civil engineering structures. This paper aims to review studies that have used the SSI algorithm for the damage identification and modal analysis of structures. The fundamental focus is on data-driven and covariance-driven SSI algorithms. In this review, we consider the subspace algorithm to resolve the problem of a real-world application for SHM. With regard to performance, a comparison between SSI and other methods is provided in order to investigate its advantages and disadvantages. The applied methods of SHM in civil engineering structures are categorized into three classes, from simple one-dimensional (1D) to very complex structures, and the detectability of the SSI for different damage scenarios are reported. Finally, the available software incorporating SSI as their system identification technique are investigated

    Covariance estimation for multivariate conditionally Gaussian dynamic linear models

    Full text link
    In multivariate time series, the estimation of the covariance matrix of the observation innovations plays an important role in forecasting as it enables the computation of the standardized forecast error vectors as well as it enables the computation of confidence bounds of the forecasts. We develop an on-line, non-iterative Bayesian algorithm for estimation and forecasting. It is empirically found that, for a range of simulated time series, the proposed covariance estimator has good performance converging to the true values of the unknown observation covariance matrix. Over a simulated time series, the new method approximates the correct estimates, produced by a non-sequential Monte Carlo simulation procedure, which is used here as the gold standard. The special, but important, vector autoregressive (VAR) and time-varying VAR models are illustrated by considering London metal exchange data consisting of spot prices of aluminium, copper, lead and zinc.Comment: 21 pages, 2 figures, 6 table

    ESTIMATING THE SYSTEM ORDER BY SUBSPACE METHODS

    Get PDF
    This paper discusses how to determine the order of a state-space model. To do so, we start by revising existing approaches and find in them three basic shortcomings: i) some of them have a poor performance in short samples, ii) most of them are not robust and iii) none of them can accommodate seasonality. We tackle the first two issues by proposing new and refined criteria. The third issue is dealt with by decomposing the system into regular and seasonal sub-systems. The performance of all the procedures considered is analyzed through Monte Carlo simulations.

    Identification of Civil Engineering Structures using Vector ARMA Models

    Get PDF

    Forecasting Time Series with VARMA Recursions on Graphs

    Full text link
    Graph-based techniques emerged as a choice to deal with the dimensionality issues in modeling multivariate time series. However, there is yet no complete understanding of how the underlying structure could be exploited to ease this task. This work provides contributions in this direction by considering the forecasting of a process evolving over a graph. We make use of the (approximate) time-vertex stationarity assumption, i.e., timevarying graph signals whose first and second order statistical moments are invariant over time and correlated to a known graph topology. The latter is combined with VAR and VARMA models to tackle the dimensionality issues present in predicting the temporal evolution of multivariate time series. We find out that by projecting the data to the graph spectral domain: (i) the multivariate model estimation reduces to that of fitting a number of uncorrelated univariate ARMA models and (ii) an optimal low-rank data representation can be exploited so as to further reduce the estimation costs. In the case that the multivariate process can be observed at a subset of nodes, the proposed models extend naturally to Kalman filtering on graphs allowing for optimal tracking. Numerical experiments with both synthetic and real data validate the proposed approach and highlight its benefits over state-of-the-art alternatives.Comment: submitted to the IEEE Transactions on Signal Processin

    Comparative review of methods for stability monitoring in electrical power systems and vibrating structures

    Get PDF
    This study provides a review of methods used for stability monitoring in two different fields, electrical power systems and vibration analysis, with the aim of increasing awareness of and highlighting opportunities for cross-fertilisation. The nature of the problems that require stability monitoring in both fields are discussed here as well as the approaches that have been taken. The review of power systems methods is presented in two parts: methods for ambient or normal operation and methods for transient or post-fault operation. Similarly, the review of methods for vibration analysis is presented in two parts: methods for stationary or linear time-invariant data and methods for non-stationary or non-linear time-variant data. Some observations and comments are made regarding methods that have already been applied in both fields including recommendations for the use of different sets of algorithms that have not been utilised to date. Additionally, methods that have been applied to vibration analysis and have potential for power systems stability monitoring are discussed and recommended. � 2010 The Institution of Engineering and Technology

    Surrogate time series

    Full text link
    Before we apply nonlinear techniques, for example those inspired by chaos theory, to dynamical phenomena occurring in nature, it is necessary to first ask if the use of such advanced techniques is justified "by the data". While many processes in nature seem very unlikely a priori to be linear, the possible nonlinear nature might not be evident in specific aspects of their dynamics. The method of surrogate data has become a very popular tool to address such a question. However, while it was meant to provide a statistically rigorous, foolproof framework, some limitations and caveats have shown up in its practical use. In this paper, recent efforts to understand the caveats, avoid the pitfalls, and to overcome some of the limitations, are reviewed and augmented by new material. In particular, we will discuss specific as well as more general approaches to constrained randomisation, providing a full range of examples. New algorithms will be introduced for unevenly sampled and multivariate data and for surrogate spike trains. The main limitation, which lies in the interpretability of the test results, will be illustrated through instructive case studies. We will also discuss some implementational aspects of the realisation of these methods in the TISEAN (http://www.mpipks-dresden.mpg.de/~tisean) software package.Comment: 28 pages, 23 figures, software at http://www.mpipks-dresden.mpg.de/~tisea
    corecore