Search CORE

7,414 research outputs found

Rademacher complexity of stationary sequences

Author: McDonald Daniel J.
Shalizi Cosma Rohilla
Publication venue
Publication date: 22/05/2017
Field of study

We show how to control the generalization error of time series models wherein past values of the outcome are used to predict future values. The results are based on a generalization of standard i.i.d. concentration inequalities to dependent data without the mixing assumptions common in the time series setting. Our proof and the result are simpler than previous analyses with dependent data or stochastic adversaries which use sequential Rademacher complexities rather than the expected Rademacher complexity for i.i.d. processes. We also derive empirical Rademacher results without mixing assumptions resulting in fully calculable upper bounds.Comment: 15 pages, 1 figur

arXiv.org e-Print Archive

Theory and Algorithms for Forecasting Time Series

Author: Kuznetsov Vitaly
Mohri Mehryar
Publication venue
Publication date: 15/03/2018
Field of study

We present data-dependent learning bounds for the general scenario of non-stationary non-mixing stochastic processes. Our learning guarantees are expressed in terms of a data-dependent measure of sequential complexity and a discrepancy measure that can be estimated from data under some mild assumptions. We also also provide novel analysis of stable time series forecasting algorithm using this new notion of discrepancy that we introduce. We use our learning bounds to devise new algorithms for non-stationary time series forecasting for which we report some preliminary experimental results.Comment: An extended abstract has appeared in (Kuznetsov and Mohri, 2015

arXiv.org e-Print Archive

Foundations of Sequence-to-Sequence Modeling for Time Series

Author: Kuznetsov Vitaly
Mariet Zelda
Publication venue
Publication date: 26/02/2019
Field of study

The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.Comment: To appear at AISTATS 201

arXiv.org e-Print Archive

Nonparametric risk bounds for time-series forecasting

Author: McDonald Daniel J.
Schervish Mark
Shalizi Cosma Rohilla
Publication venue
Publication date: 10/09/2016
Field of study

We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification.Comment: 34 pages, 3 figure

arXiv.org e-Print Archive

Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks

Author: Uziel Guy
Publication venue
Publication date: 26/05/2019
Field of study

Deep neural networks are considered to be state of the art models in many offline machine learning tasks. However, their performance and generalization abilities in online learning tasks are much less understood. Therefore, we focus on online learning and tackle the challenging problem where the underlying process is stationary and ergodic and thus removing the i.i.d. assumption and allowing observations to depend on each other arbitrarily. We prove the generalization abilities of Lipschitz regularized deep neural networks and show that by using those networks, a convergence to the best possible prediction strategy is guaranteed

arXiv.org e-Print Archive

Generic Variance Bounds on Estimation and Prediction Errors in Time Series Analysis: An Entropy Perspective

Author: Fang Song
Ishii Hideaki
Johansson Karl Henrik
Skoglund Mikael
Zhu Quanyan
Publication venue
Publication date: 11/10/2019
Field of study

In this paper, we obtain generic bounds on the variances of estimation and prediction errors in time series analysis via an information-theoretic approach. It is seen in general that the error bounds are determined by the conditional entropy of the data point to be estimated or predicted given the side information or past observations. Additionally, we discover that in order to achieve the prediction error bounds asymptotically, the necessary and sufficient condition is that the "innovation" is asymptotically white Gaussian. When restricted to Gaussian processes and 1-step prediction, our bounds are shown to reduce to the Kolmogorov-Szeg\"o formula and Wiener-Masani formula known from linear prediction theory

arXiv.org e-Print Archive

Bootstrapping Generalization Error Bounds for Time Series

Author: Lunde Robert
Shalizi Cosma Rohilla
Publication venue
Publication date: 29/11/2017
Field of study

We consider the problem of finding confidence intervals for the risk of forecasting the future of a stationary, ergodic stochastic process, using a model estimated from the past of the process. We show that a bootstrap procedure provides valid confidence intervals for the risk, when the data source is sufficiently mixing, and the loss function and the estimator are suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey the necessary regularity conditions, even when mis-specified, and simulations show that the finite- sample coverage of our bounds quickly converges to the theoretical, asymptotic level. As an intermediate step, we derive sufficient conditions for asymptotic independence between empirical distribution functions formed by splitting a realization of a stochastic process, of independent interest

arXiv.org e-Print Archive

Predictive PAC Learning and Process Decompositions

Author: Kontorovich Aryeh
Shalizi Cosma Rohilla
Publication venue
Publication date: 19/09/2013
Field of study

We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular (

\beta

-mixing) processes, of independent probability-theoretic interest.Comment: 9 pages, accepted in NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Conditional Risk Minimization for Stochastic Processes

Author: Lampert Christoph H.
Zimin Alexander
Publication venue
Publication date: 13/03/2016
Field of study

We study the task of learning from non-i.i.d. data. In particular, we aim at learning predictors that minimize the conditional risk for a stochastic process, i.e. the expected loss of the predictor on the next point conditioned on the set of training samples observed so far. For non-i.i.d. data, the training set contains information about the upcoming samples, so learning with respect to the conditional distribution can be expected to yield better predictors than one obtains from the classical setting of minimizing the marginal risk. Our main contribution is a practical estimator for the conditional risk based on the theory of non-parametric time-series prediction, and a finite sample concentration bound that establishes uniform convergence of the estimator to the true conditional risk under certain regularity assumptions on the process

arXiv.org e-Print Archive

High dimensional VAR with low rank transition

Author: Alquier Pierre
Bertin Karine
Doukhan Paul
Garnier Rémy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2020
Field of study

We propose a vector auto-regressive (VAR) model with a low-rank constraint on the transition matrix. This new model is well suited to predict high-dimensional series that are highly correlated, or that are driven by a small number of hidden factors. We study estimation, prediction, and rank selection for this model in a very general setting. Our method shows excellent performances on a wide variety of simulated datasets. On macro-economic data from Giannone et al. (2015), our method is competitive with state-of-the-art methods in small dimension, and even improves on them in high dimension

arXiv.org e-Print Archive