415 research outputs found
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A statistical model of internet traffic.
PhDWe present a method to extract a time series (Number of Active Requests (NAR))
from web cache logs which serves as a transport level measurement of internet traffic.
This series also reflects the performance or Quality of Service of a web cache. Using
time series modelling, we interpret the properties of this kind of internet traffic and
its effect on the performance perceived by the cache user.
Our preliminary analysis of NAR concludes that this dataset is suggestive of a
long-memory self-similar process but is not heavy-tailed. Having carried out more
in-depth analysis, we propose a three stage modelling process of the time series: (i)
a power transformation to normalise the data, (ii) a polynomial fit to approximate
the general trend and (iii) a modelling of the residuals from the polynomial fit. We
analyse the polynomial and show that the residual dataset may be modelled as a
FARIMA(p, d, q) process.
Finally, we use Canonical Variate Analysis to determine the most significant defining
properties of our measurements and draw conclusions to categorise the differences
in traffic properties between the various caches studied. We show that the strongest
illustration of differences between the caches is shown by the short memory parameters
of the FARIMA fit. We compare the differences revealed between our studied
caches and draw conclusions on them. Several programs have been written in Perl and
S programming languages for this analysis including totalqd.pl for NAR calculation,
fullanalysis for general statistical analysis of the data and armamodel for FARIMA
modelling
Health monitoring of civil infrastructures by subspace system identification method: an overview
Structural health monitoring (SHM) is the main contributor of the future's smart city to deal with the need for safety, lower maintenance costs, and reliable condition assessment of structures. Among the algorithms used for SHM to identify the system parameters of structures, subspace system identification (SSI) is a reliable method in the time-domain that takes advantages of using extended observability matrices. Considerable numbers of studies have specifically concentrated on practical applications of SSI in recent years. To the best of author's knowledge, no study has been undertaken to review and investigate the application of SSI in the monitoring of civil engineering structures. This paper aims to review studies that have used the SSI algorithm for the damage identification and modal analysis of structures. The fundamental focus is on data-driven and covariance-driven SSI algorithms. In this review, we consider the subspace algorithm to resolve the problem of a real-world application for SHM. With regard to performance, a comparison between SSI and other methods is provided in order to investigate its advantages and disadvantages. The applied methods of SHM in civil engineering structures are categorized into three classes, from simple one-dimensional (1D) to very complex structures, and the detectability of the SSI for different damage scenarios are reported. Finally, the available software incorporating SSI as their system identification technique are investigated
Covariance estimation for multivariate conditionally Gaussian dynamic linear models
In multivariate time series, the estimation of the covariance matrix of the
observation innovations plays an important role in forecasting as it enables
the computation of the standardized forecast error vectors as well as it
enables the computation of confidence bounds of the forecasts. We develop an
on-line, non-iterative Bayesian algorithm for estimation and forecasting. It is
empirically found that, for a range of simulated time series, the proposed
covariance estimator has good performance converging to the true values of the
unknown observation covariance matrix. Over a simulated time series, the new
method approximates the correct estimates, produced by a non-sequential Monte
Carlo simulation procedure, which is used here as the gold standard. The
special, but important, vector autoregressive (VAR) and time-varying VAR models
are illustrated by considering London metal exchange data consisting of spot
prices of aluminium, copper, lead and zinc.Comment: 21 pages, 2 figures, 6 table
ESTIMATING THE SYSTEM ORDER BY SUBSPACE METHODS
This paper discusses how to determine the order of a state-space model. To do so, we start by revising existing approaches and find in them three basic shortcomings: i) some of them have a poor performance in short samples, ii) most of them are not robust and iii) none of them can accommodate seasonality. We tackle the first two issues by proposing new and refined criteria. The third issue is dealt with by decomposing the system into regular and seasonal sub-systems. The performance of all the procedures considered is analyzed through Monte Carlo simulations.
Forecasting Time Series with VARMA Recursions on Graphs
Graph-based techniques emerged as a choice to deal with the dimensionality
issues in modeling multivariate time series. However, there is yet no complete
understanding of how the underlying structure could be exploited to ease this
task. This work provides contributions in this direction by considering the
forecasting of a process evolving over a graph. We make use of the
(approximate) time-vertex stationarity assumption, i.e., timevarying graph
signals whose first and second order statistical moments are invariant over
time and correlated to a known graph topology. The latter is combined with VAR
and VARMA models to tackle the dimensionality issues present in predicting the
temporal evolution of multivariate time series. We find out that by projecting
the data to the graph spectral domain: (i) the multivariate model estimation
reduces to that of fitting a number of uncorrelated univariate ARMA models and
(ii) an optimal low-rank data representation can be exploited so as to further
reduce the estimation costs. In the case that the multivariate process can be
observed at a subset of nodes, the proposed models extend naturally to Kalman
filtering on graphs allowing for optimal tracking. Numerical experiments with
both synthetic and real data validate the proposed approach and highlight its
benefits over state-of-the-art alternatives.Comment: submitted to the IEEE Transactions on Signal Processin
Comparative review of methods for stability monitoring in electrical power systems and vibrating structures
This study provides a review of methods used for stability monitoring in two different fields, electrical power systems and vibration analysis, with the aim of increasing awareness of and highlighting opportunities for cross-fertilisation. The nature of the problems that require stability monitoring in both fields are discussed here as well as the approaches that have been taken. The review of power systems methods is presented in two parts: methods for ambient or normal operation and methods for transient or post-fault operation. Similarly, the review of methods for vibration analysis is presented in two parts: methods for stationary or linear time-invariant data and methods for non-stationary or non-linear time-variant data. Some observations and comments are made regarding methods that have already been applied in both fields including recommendations for the use of different sets of algorithms that have not been utilised to date. Additionally, methods that have been applied to vibration analysis and have potential for power systems stability monitoring are discussed and recommended. � 2010 The Institution of Engineering and Technology
Surrogate time series
Before we apply nonlinear techniques, for example those inspired by chaos
theory, to dynamical phenomena occurring in nature, it is necessary to first
ask if the use of such advanced techniques is justified "by the data". While
many processes in nature seem very unlikely a priori to be linear, the possible
nonlinear nature might not be evident in specific aspects of their dynamics.
The method of surrogate data has become a very popular tool to address such a
question. However, while it was meant to provide a statistically rigorous,
foolproof framework, some limitations and caveats have shown up in its
practical use. In this paper, recent efforts to understand the caveats, avoid
the pitfalls, and to overcome some of the limitations, are reviewed and
augmented by new material. In particular, we will discuss specific as well as
more general approaches to constrained randomisation, providing a full range of
examples. New algorithms will be introduced for unevenly sampled and
multivariate data and for surrogate spike trains. The main limitation, which
lies in the interpretability of the test results, will be illustrated through
instructive case studies. We will also discuss some implementational aspects of
the realisation of these methods in the TISEAN
(http://www.mpipks-dresden.mpg.de/~tisean) software package.Comment: 28 pages, 23 figures, software at
http://www.mpipks-dresden.mpg.de/~tisea
- …