21 research outputs found
Rademacher complexity of stationary sequences
We show how to control the generalization error of time series models wherein
past values of the outcome are used to predict future values. The results are
based on a generalization of standard i.i.d. concentration inequalities to
dependent data without the mixing assumptions common in the time series
setting. Our proof and the result are simpler than previous analyses with
dependent data or stochastic adversaries which use sequential Rademacher
complexities rather than the expected Rademacher complexity for i.i.d.
processes. We also derive empirical Rademacher results without mixing
assumptions resulting in fully calculable upper bounds.Comment: 15 pages, 1 figur
Bootstrapping Generalization Error Bounds for Time Series
We consider the problem of finding confidence intervals for the risk of
forecasting the future of a stationary, ergodic stochastic process, using a
model estimated from the past of the process. We show that a bootstrap
procedure provides valid confidence intervals for the risk, when the data
source is sufficiently mixing, and the loss function and the estimator are
suitably smooth. Autoregressive (AR(d)) models estimated by least squares obey
the necessary regularity conditions, even when mis-specified, and simulations
show that the finite- sample coverage of our bounds quickly converges to the
theoretical, asymptotic level. As an intermediate step, we derive sufficient
conditions for asymptotic independence between empirical distribution functions
formed by splitting a realization of a stochastic process, of independent
interest
Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator
Reinforcement learning (RL) has been successfully used to solve many
continuous control tasks. Despite its impressive results however, fundamental
questions regarding the sample complexity of RL on continuous problems remain
open. We study the performance of RL in this setting by considering the
behavior of the Least-Squares Temporal Difference (LSTD) estimator on the
classic Linear Quadratic Regulator (LQR) problem from optimal control. We give
the first finite-time analysis of the number of samples needed to estimate the
value function for a fixed static state-feedback policy to within
-relative error. In the process of deriving our result, we give a
general characterization for when the minimum eigenvalue of the empirical
covariance matrix formed along the sample path of a fast-mixing stochastic
process concentrates above zero, extending a result by Koltchinskii and
Mendelson in the independent covariates setting. Finally, we provide
experimental evidence indicating that our analysis correctly captures the
qualitative behavior of LSTD on several LQR instances
Generalisation in fully-connected neural networks for time series forecasting
In this paper we study the generalization capabilities of fully-connected
neural networks trained in the context of time series forecasting. Time series
do not satisfy the typical assumption in statistical learning theory of the
data being i.i.d. samples from some data-generating distribution. We use the
input and weight Hessians, that is the smoothness of the learned function with
respect to the input and the width of the minimum in weight space, to quantify
a network's ability to generalize to unseen data. While such generalization
metrics have been studied extensively in the i.i.d. setting of for example
image recognition, here we empirically validate their use in the task of time
series forecasting. Furthermore we discuss how one can control the
generalization capability of the network by means of the training process using
the learning rate, batch size and the number of training iterations as
controls. Using these hyperparameters one can efficiently control the
complexity of the output function without imposing explicit constraints
Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification
We prove that the ordinary least-squares (OLS) estimator attains nearly
minimax optimal performance for the identification of linear dynamical systems
from a single observed trajectory. Our upper bound relies on a generalization
of Mendelson's small-ball method to dependent data, eschewing the use of
standard mixing-time arguments. Our lower bounds reveal that these upper bounds
match up to logarithmic factors. In particular, we capture the correct
signal-to-noise behavior of the problem, showing that more unstable linear
systems are easier to estimate. This behavior is qualitatively different from
arguments which rely on mixing-time calculations that suggest that unstable
systems are more difficult to estimate. We generalize our technique to provide
bounds for a more general class of linear response time-series
Finite Time Identification in Unstable Linear Systems
Identification of the parameters of stable linear dynamical systems is a
well-studied problem in the literature, both in the low and high-dimensional
settings. However, there are hardly any results for the unstable case,
especially regarding finite time bounds. For this setting, classical results on
least-squares estimation of the dynamics parameters are not applicable and
therefore new concepts and technical approaches need to be developed to address
the issue. Unstable linear systems arise in key real applications in control
theory, econometrics, and finance. This study establishes finite time bounds
for the identification error of the least-squares estimates for a fairly large
class of heavy-tailed noise distributions, and transition matrices of such
systems. The results relate the time length (samples) required for estimation
to a function of the problem dimension and key characteristics of the true
underlying transition matrix and the noise distribution. To establish them,
appropriate concentration inequalities for random matrices and for sequences of
martingale differences are leveraged
Hypothesis Set Stability and Generalization
We present a study of generalization for data-dependent hypothesis sets. We
give a general learning guarantee for data-dependent hypothesis sets based on a
notion of transductive Rademacher complexity. Our main result is a
generalization bound for data-dependent hypothesis sets expressed in terms of a
notion of hypothesis set stability and a notion of Rademacher complexity for
data-dependent hypothesis sets that we introduce. This bound admits as special
cases both standard Rademacher complexity bounds and algorithm-dependent
uniform stability bounds. We also illustrate the use of these learning bounds
in the analysis of several scenarios.Comment: Published in NeurIPS 2019. This version is equivalent to the
camera-ready version but also includes the supplementary materia
Theory and Algorithms for Forecasting Time Series
We present data-dependent learning bounds for the general scenario of
non-stationary non-mixing stochastic processes. Our learning guarantees are
expressed in terms of a data-dependent measure of sequential complexity and a
discrepancy measure that can be estimated from data under some mild
assumptions. We also also provide novel analysis of stable time series
forecasting algorithm using this new notion of discrepancy that we introduce.
We use our learning bounds to devise new algorithms for non-stationary time
series forecasting for which we report some preliminary experimental results.Comment: An extended abstract has appeared in (Kuznetsov and Mohri, 2015
On Learnability under General Stochastic Processes
Statistical learning theory under independent and identically distributed
(iid) sampling and online learning theory for worst case individual sequences
are two of the best developed branches of learning theory. Statistical learning
under general non-iid stochastic processes is less mature. We provide two
natural notions of learnability of a function class under a general stochastic
process. We are able to sandwich the first one between iid and online
learnability. We show that the second one is in fact equivalent to online
learnability. Our results are sharpest in the binary classification setting but
we also show that similar results continue to hold in the regression setting
On the Sample Complexity of the Linear Quadratic Regulator
This paper addresses the optimal control problem known as the Linear
Quadratic Regulator in the case when the dynamics are unknown. We propose a
multi-stage procedure, called Coarse-ID control, that estimates a model from a
few experimental trials, estimates the error in that model with respect to the
truth, and then designs a controller using both the model and uncertainty
estimate. Our technique uses contemporary tools from random matrix theory to
bound the error in the estimation procedure. We also employ a recently
developed approach to control synthesis called System Level Synthesis that
enables robust control design by solving a convex optimization problem. We
provide end-to-end bounds on the relative error in control cost that are nearly
optimal in the number of parameters and that highlight salient properties of
the system to be controlled such as closed-loop sensitivity and optimal control
magnitude. We show experimentally that the Coarse-ID approach enables efficient
computation of a stabilizing controller in regimes where simple control schemes
that do not take the model uncertainty into account fail to stabilize the true
system.Comment: Contains a new analysis of finite-dimensional truncation, a new
data-dependent estimation bound, and an expanded exposition on necessary
background in control theory and System Level Synthesi