2,910 research outputs found
Dynamic modeling of mean-reverting spreads for statistical arbitrage
Statistical arbitrage strategies, such as pairs trading and its
generalizations, rely on the construction of mean-reverting spreads enjoying a
certain degree of predictability. Gaussian linear state-space processes have
recently been proposed as a model for such spreads under the assumption that
the observed process is a noisy realization of some hidden states. Real-time
estimation of the unobserved spread process can reveal temporary market
inefficiencies which can then be exploited to generate excess returns. Building
on previous work, we embrace the state-space framework for modeling spread
processes and extend this methodology along three different directions. First,
we introduce time-dependency in the model parameters, which allows for quick
adaptation to changes in the data generating process. Second, we provide an
on-line estimation algorithm that can be constantly run in real-time. Being
computationally fast, the algorithm is particularly suitable for building
aggressive trading strategies based on high-frequency data and may be used as a
monitoring device for mean-reversion. Finally, our framework naturally provides
informative uncertainty measures of all the estimated parameters. Experimental
results based on Monte Carlo simulations and historical equity data are
discussed, including a co-integration relationship involving two
exchange-traded funds.Comment: 34 pages, 6 figures. Submitte
Recommended from our members
Estimating Mean and Covariance Structure with Reweighted Least Squares
Does Reweighted Least Squares (RLS) perform better in small samples than maximum likelihood (ML) for mean and covariance structure? ML statistics in covariance structure analysis are based on the asymptotic normality assumption; however, actual applications of structural equation modeling (SEM) in social and behavioral science research usually involve small samples. It has been found that chi-square tests often incorrectly over-reject the null hypothesis: Σ=Σ(θ), because when sample is small the sample covariance matrix becomes ill-conditioned and entails unstable estimates. In certain SEM models, the vector of parameter must contain both means, variances and covariances. Yet, whether RLS also works in mean and covariance structure remains unexamined. This research is an extended examination of reweighted least squares in mean and covariance structure. Specifically, we replace biased covariance matrix in traditional GLS function (Browne, 1974) with the unbiased sample covariance matrix that derives from ML estimation. Moreover, under the assumption of multivariate normality, a Monte Carlo simulation study was carried out to examine the statistical performance as compared with ML methods in different sample sizes. Based on empirical rejection frequencies and empirical averages of test statistic, this study shows that RLS performs much better than ML in mean and covariance structure models when sample sizes are small
Nonlinear maximum likelihood estimation of autoregressive time series
Includes bibliographical references.In this paper, we describe an algorithm for finding the exact, nonlinear, maximum likelihood (ML) estimators for the parameters of an autoregressive time series. We demonstrate that the ML normal equations can be written as an interdependent set of cubic and quadratic equations in the AR polynomial coefficients. We present an algorithm that algebraically solves this set of nonlinear equations for low-order problems. For high-order problems, we describe iterative algorithms for obtaining a ML solution.This work was supported by Bonneville Power Administration under Contract #DEBI7990BPO7346 and by the Office of Naval Research, Statistics and Probability Branch, under Contract N00014-89-J-1070
Maximum Entropy Vector Kernels for MIMO system identification
Recent contributions have framed linear system identification as a
nonparametric regularized inverse problem. Relying on -type
regularization which accounts for the stability and smoothness of the impulse
response to be estimated, these approaches have been shown to be competitive
w.r.t classical parametric methods. In this paper, adopting Maximum Entropy
arguments, we derive a new penalty deriving from a vector-valued
kernel; to do so we exploit the structure of the Hankel matrix, thus
controlling at the same time complexity, measured by the McMillan degree,
stability and smoothness of the identified models. As a special case we recover
the nuclear norm penalty on the squared block Hankel matrix. In contrast with
previous literature on reweighted nuclear norm penalties, our kernel is
described by a small number of hyper-parameters, which are iteratively updated
through marginal likelihood maximization; constraining the structure of the
kernel acts as a (hyper)regularizer which helps controlling the effective
degrees of freedom of our estimator. To optimize the marginal likelihood we
adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be
significantly computationally cheaper than other first and second order
off-the-shelf optimization methods. The paper also contains an extensive
comparison with many state-of-the-art methods on several Monte-Carlo studies,
which confirms the effectiveness of our procedure
Handling of Missing Values in Static and Dynamic Data Sets
This thesis contributes by first, conducting a comparative study of traditional and modern classifications by highlighting the differences in their performance. Second, an algorithm to enhance the prediction of values to be used for data imputation with nonlinear models is presented. Third, a novel algorithm model selection to enhance prediction performance in the presence of missing data is presented. It includes an overview of nonlinear model selection with complete data, and provides summary descriptions of Box-Tidwell and fractional polynomial methods for model selection. In particular, it focuses on the fractional polynomial method for nonlinear modelling in cases of missing data. An analysis ex- ample is presented to illustrate the performance of this method
Models and methods for computationally efficient analysis of large spatial and spatio-temporal data
With the development of technology, massive amounts of data are often observed at a large number of spatial locations (n). However, statistical analysis is usually not feasible or not computationally efficient for such large dataset. This is the so-called big n problem .
The goal of this dissertation is to contribute solutions to the big n problem . The dissertation is devoted to computationally efficient methods and models for large spatial and spatio-temporal data. Several approximation methods to the big n problem are reviewed, and an extended autoregressive model, called the EAR model, is proposed as a parsimonious model that accounts for smoothness of a process collected over space. It is an extension of the Pettitt et a1. as well as Czado and Prokopenko parameterizations of the spatial conditional autoregressive (CAR) model. To complement the computational advantage, a structure removing orthonormal transformation named pre-whitening is described. This transformation is based on a singular value decomposition and results in the removal of spatial structure from the data. Circulant embedding technique further simplifies the calculation of eigenvalues and eigenvectors for the pre-whitening procedure.
The EAR model is studied to have connections to the Matern class covariance structure in geostatistics as well as the integrated nested Laplace approximation (INLA) approach that is based on a stochastic partial differential equation (SPDE) framework. To model geostatistical data, a latent spatial Gaussian Markov random field (GMRF) with an EAR model prior is applied. The GMRF is defined on a fine grid and thus enables the posterior precision matrix to be diagonal through introducing a missing data scheme. This results in parameter estimation and spatial interpolation simultaneously under the Bayesian Markov chain Monte Carlo (MCMC) framework.
The EAR model is naturally extended to spatio-temporal models. In particular, a spatio-temporal model with spatially varying temporal trend parameters is discussed
- …