7,414 research outputs found
Rademacher complexity of stationary sequences
We show how to control the generalization error of time series models wherein
past values of the outcome are used to predict future values. The results are
based on a generalization of standard i.i.d. concentration inequalities to
dependent data without the mixing assumptions common in the time series
setting. Our proof and the result are simpler than previous analyses with
dependent data or stochastic adversaries which use sequential Rademacher
complexities rather than the expected Rademacher complexity for i.i.d.
processes. We also derive empirical Rademacher results without mixing
assumptions resulting in fully calculable upper bounds.Comment: 15 pages, 1 figur
Theory and Algorithms for Forecasting Time Series
We present data-dependent learning bounds for the general scenario of
non-stationary non-mixing stochastic processes. Our learning guarantees are
expressed in terms of a data-dependent measure of sequential complexity and a
discrepancy measure that can be estimated from data under some mild
assumptions. We also also provide novel analysis of stable time series
forecasting algorithm using this new notion of discrepancy that we introduce.
We use our learning bounds to devise new algorithms for non-stationary time
series forecasting for which we report some preliminary experimental results.Comment: An extended abstract has appeared in (Kuznetsov and Mohri, 2015
Foundations of Sequence-to-Sequence Modeling for Time Series
The availability of large amounts of time series data, paired with the
performance of deep-learning algorithms on a broad class of problems, has
recently led to significant interest in the use of sequence-to-sequence models
for time series forecasting. We provide the first theoretical analysis of this
time series forecasting framework. We include a comparison of
sequence-to-sequence modeling to classical time series models, and as such our
theory can serve as a quantitative guide for practitioners choosing between
different modeling methodologies.Comment: To appear at AISTATS 201
Nonparametric risk bounds for time-series forecasting
We derive generalization error bounds for traditional time-series forecasting
models. Our results hold for many standard forecasting tools including
autoregressive models, moving average models, and, more generally, linear
state-space models. These non-asymptotic bounds need only weak assumptions on
the data-generating process, yet allow forecasters to select among competing
models and to guarantee, with high probability, that their chosen model will
perform well. We motivate our techniques with and apply them to standard
economic and financial forecasting tools---a GARCH model for predicting equity
volatility and a dynamic stochastic general equilibrium model (DSGE), the
standard tool in macroeconomic forecasting. We demonstrate in particular how
our techniques can aid forecasters and policy makers in choosing models which
behave well under uncertainty and mis-specification.Comment: 34 pages, 3 figure
Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks
Deep neural networks are considered to be state of the art models in many
offline machine learning tasks. However, their performance and generalization
abilities in online learning tasks are much less understood. Therefore, we
focus on online learning and tackle the challenging problem where the
underlying process is stationary and ergodic and thus removing the i.i.d.
assumption and allowing observations to depend on each other arbitrarily. We
prove the generalization abilities of Lipschitz regularized deep neural
networks and show that by using those networks, a convergence to the best
possible prediction strategy is guaranteed
Generic Variance Bounds on Estimation and Prediction Errors in Time Series Analysis: An Entropy Perspective
In this paper, we obtain generic bounds on the variances of estimation and
prediction errors in time series analysis via an information-theoretic
approach. It is seen in general that the error bounds are determined by the
conditional entropy of the data point to be estimated or predicted given the
side information or past observations. Additionally, we discover that in order
to achieve the prediction error bounds asymptotically, the necessary and
sufficient condition is that the "innovation" is asymptotically white Gaussian.
When restricted to Gaussian processes and 1-step prediction, our bounds are
shown to reduce to the Kolmogorov-Szeg\"o formula and Wiener-Masani formula
known from linear prediction theory
Bootstrapping Generalization Error Bounds for Time Series
We consider the problem of finding confidence intervals for the risk of
forecasting the future of a stationary, ergodic stochastic process, using a
model estimated from the past of the process. We show that a bootstrap
procedure provides valid confidence intervals for the risk, when the data
source is sufficiently mixing, and the loss function and the estimator are
suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey
the necessary regularity conditions, even when mis-specified, and simulations
show that the finite- sample coverage of our bounds quickly converges to the
theoretical, asymptotic level. As an intermediate step, we derive sufficient
conditions for asymptotic independence between empirical distribution functions
formed by splitting a realization of a stochastic process, of independent
interest
Predictive PAC Learning and Process Decompositions
We informally call a stochastic process learnable if it admits a
generalization error approaching zero in probability for any concept class with
finite VC-dimension (IID processes are the simplest example). A mixture of
learnable processes need not be learnable itself, and certainly its
generalization error need not decay at the same rate. In this paper, we argue
that it is natural in predictive PAC to condition not on the past observations
but on the mixture component of the sample path. This definition not only
matches what a realistic learner might demand, but also allows us to sidestep
several otherwise grave problems in learning from dependent data. In
particular, we give a novel PAC generalization bound for mixtures of learnable
processes with a generalization error that is not worse than that of each
mixture component. We also provide a characterization of mixtures of absolutely
regular (-mixing) processes, of independent probability-theoretic
interest.Comment: 9 pages, accepted in NIPS 201
Conditional Risk Minimization for Stochastic Processes
We study the task of learning from non-i.i.d. data. In particular, we aim at
learning predictors that minimize the conditional risk for a stochastic
process, i.e. the expected loss of the predictor on the next point conditioned
on the set of training samples observed so far. For non-i.i.d. data, the
training set contains information about the upcoming samples, so learning with
respect to the conditional distribution can be expected to yield better
predictors than one obtains from the classical setting of minimizing the
marginal risk. Our main contribution is a practical estimator for the
conditional risk based on the theory of non-parametric time-series prediction,
and a finite sample concentration bound that establishes uniform convergence of
the estimator to the true conditional risk under certain regularity assumptions
on the process
High dimensional VAR with low rank transition
We propose a vector auto-regressive (VAR) model with a low-rank constraint on
the transition matrix. This new model is well suited to predict
high-dimensional series that are highly correlated, or that are driven by a
small number of hidden factors. We study estimation, prediction, and rank
selection for this model in a very general setting. Our method shows excellent
performances on a wide variety of simulated datasets. On macro-economic data
from Giannone et al. (2015), our method is competitive with state-of-the-art
methods in small dimension, and even improves on them in high dimension
- …