1,243 research outputs found
An Exponential Efron-Stein Inequality for Lq Stable Learning Rules
There is accumulating evidence in the literature that stability of learning
algorithms is a key characteristic that permits a learning algorithm to
generalize. Despite various insightful results in this direction, there seems
to be an overlooked dichotomy in the type of stability-based generalization
bounds we have in the literature. On one hand, the literature seems to suggest
that exponential generalization bounds for the estimated risk, which are
optimal, can be only obtained through stringent, distribution independent and
computationally intractable notions of stability such as uniform stability. On
the other hand, it seems that weaker notions of stability such as hypothesis
stability, although it is distribution dependent and more amenable to
computation, can only yield polynomial generalization bounds for the estimated
risk, which are suboptimal.
In this paper, we address the gap between these two regimes of results. In
particular, the main question we address here is \emph{whether it is possible
to derive exponential generalization bounds for the estimated risk using a
notion of stability that is computationally tractable and distribution
dependent, but weaker than uniform stability. Using recent advances in
concentration inequalities, and using a notion of stability that is weaker than
uniform stability but distribution dependent and amenable to computation, we
derive an exponential tail bound for the concentration of the estimated risk of
a hypothesis returned by a general learning rule, where the estimated risk is
expressed in terms of either the resubstitution estimate (empirical error), or
the deleted (or, leave-one-out) estimate. As an illustration, we derive
exponential tail bounds for ridge regression with unbounded responses, where we
show how stability changes with the tail behavior of the response variables.Comment: Additional text and appendices that were not included in the PMLR
(ALT'19) proceedings are now included in this versio
Generalization Bounds for Uniformly Stable Algorithms
Uniform stability of a learning algorithm is a classical notion of
algorithmic stability introduced to derive high-probability bounds on the
generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss
function with range bounded in , the generalization error of a
-uniformly stable learning algorithm on samples is known to be
within of the empirical error with
probability at least . Unfortunately, this bound does not lead to
meaningful generalization bounds in many common settings where . At the same time the bound is known to be tight only when .
We substantially improve generalization bounds for uniformly stable
algorithms without making any additional assumptions. First, we show that the
bound in this setting is with
probability at least . In addition, we prove a tight bound of
on the second moment of the estimation error. The best
previous bound on the second moment is . Our proofs are based
on new analysis techniques and our results imply substantially stronger
generalization guarantees for several well-studied algorithms.Comment: Appeared in Neural Information Processing Systems (NeurIPS), 201
Optimism-Based Adaptive Regulation of Linear-Quadratic Systems
The main challenge for adaptive regulation of linear-quadratic systems is the
trade-off between identification and control. An adaptive policy needs to
address both the estimation of unknown dynamics parameters (exploration), as
well as the regulation of the underlying system (exploitation). To this end,
optimism-based methods which bias the identification in favor of optimistic
approximations of the true parameter are employed in the literature. A number
of asymptotic results have been established, but their finite time counterparts
are few, with important restrictions.
This study establishes results for the worst-case regret of optimism-based
adaptive policies. The presented high probability upper bounds are optimal up
to logarithmic factors. The non-asymptotic analysis of this work requires very
mild assumptions; (i) stabilizability of the system's dynamics, and (ii)
limiting the degree of heaviness of the noise distribution. To establish such
bounds, certain novel techniques are developed to comprehensively address the
probabilistic behavior of dependent random matrices with heavy-tailed
distributions.Comment: 28 page
A PTAS for -Low Rank Approximation
A number of recent works have studied algorithms for entrywise -low
rank approximation, namely, algorithms which given an matrix
(with ), output a rank- matrix minimizing
when ; and
for .
On the algorithmic side, for , we give the first
-approximation algorithm running in time
. Further, for , we give the first
almost-linear time approximation scheme for what we call the Generalized Binary
-Rank- problem. Our algorithm computes -approximation
in time .
On the hardness of approximation side, for , assuming the Small
Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show
that there exists such that the entrywise
-Rank- problem has no -approximation algorithm running in
time .Comment: Accepted at SODA'19, 61 page
Make Up Your Mind: The Price of Online Queries in Differential Privacy
We consider the problem of answering queries about a sensitive dataset
subject to differential privacy. The queries may be chosen adversarially from a
larger set Q of allowable queries in one of three ways, which we list in order
from easiest to hardest to answer:
Offline: The queries are chosen all at once and the differentially private
mechanism answers the queries in a single batch.
Online: The queries are chosen all at once, but the mechanism only receives
the queries in a streaming fashion and must answer each query before seeing the
next query.
Adaptive: The queries are chosen one at a time and the mechanism must answer
each query before the next query is chosen. In particular, each query may
depend on the answers given to previous queries.
Many differentially private mechanisms are just as efficient in the adaptive
model as they are in the offline model. Meanwhile, most lower bounds for
differential privacy hold in the offline setting. This suggests that the three
models may be equivalent.
We prove that these models are all, in fact, distinct. Specifically, we show
that there is a family of statistical queries such that exponentially more
queries from this family can be answered in the offline model than in the
online model. We also exhibit a family of search queries such that
exponentially more queries from this family can be answered in the online model
than in the adaptive model. We also investigate whether such separations might
hold for simple queries like threshold queries over the real line
Input Perturbations for Adaptive Control and Learning
This paper studies adaptive algorithms for simultaneous regulation (i.e.,
control) and estimation (i.e., learning) of Multiple Input Multiple Output
(MIMO) linear dynamical systems. It proposes practical, easy to implement
control policies based on perturbations of input signals. Such policies are
shown to achieve a worst-case regret that scales as the square-root of the time
horizon, and holds uniformly over time. Further, it discusses specific settings
where such greedy policies attain the information theoretic lower bound of
logarithmic regret. To establish the results, recent advances on
self-normalized martingales together with a novel method of policy
decomposition are leveraged
Learning Linear-Quadratic Regulators Efficiently with only Regret
We present the first computationally-efficient algorithm with regret for learning in Linear Quadratic Control systems with
unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and
Szepesv\'ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018)
Cross-validation Confidence Intervals for Test Error
This work develops central limit theorems for cross-validation and consistent
estimators of its asymptotic variance under weak stability conditions on the
learning algorithm. Together, these results provide practical,
asymptotically-exact confidence intervals for -fold test error and valid,
powerful hypothesis tests of whether one learning algorithm has smaller
-fold test error than another. These results are also the first of their
kind for the popular choice of leave-one-out cross-validation. In our real-data
experiments with diverse learning algorithms, the resulting intervals and tests
outperform the most popular alternative methods from the literature.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS
2020); 40 pages, 15 figure
Toward Better Generalization Bounds with Locally Elastic Stability
Algorithmic stability is a key characteristic to ensure the generalization
ability of a learning algorithm. Among different notions of stability,
\emph{uniform stability} is arguably the most popular one, which yields
exponential generalization bounds. However, uniform stability only considers
the worst-case loss change (or so-called sensitivity) by removing a single data
point, which is distribution-independent and therefore undesirable. There are
many cases that the worst-case sensitivity of the loss is much larger than the
average sensitivity taken over the single data point that is removed,
especially in some advanced models such as random feature models or neural
networks. Many previous works try to mitigate the distribution independent
issue by proposing weaker notions of stability, however, they either only yield
polynomial bounds or the bounds derived do not vanish as sample size goes to
infinity. Given that, we propose \emph{locally elastic stability} as a weaker
and distribution-dependent stability notion, which still yields exponential
generalization bounds. We further demonstrate that locally elastic stability
implies tighter generalization bounds than those derived based on uniform
stability in many situations by revisiting the examples of bounded support
vector machines, regularized least square regressions, and stochastic gradient
descent.Comment: Published in ICML 202
Asymptotic equivalence of regularization methods in thresholded parameter space
High-dimensional data analysis has motivated a spectrum of regularization
methods for variable selection and sparse modeling, with two popular classes of
convex ones and concave ones. A long debate has been on whether one class
dominates the other, an important question both in theory and to practitioners.
In this paper, we characterize the asymptotic equivalence of regularization
methods, with general penalty functions, in a thresholded parameter space under
the generalized linear model setting, where the dimensionality can grow up to
exponentially with the sample size. To assess their performance, we establish
the oracle inequalities, as in Bickel, Ritov and Tsybakov (2009), of the global
minimizer for these methods under various prediction and variable selection
losses. These results reveal an interesting phase transition phenomenon. For
polynomially growing dimensionality, the -regularization method of Lasso
and concave methods are asymptotically equivalent, having the same convergence
rates in the oracle inequalities. For exponentially growing dimensionality,
concave methods are asymptotically equivalent but have faster convergence rates
than the Lasso. We also establish a stronger property of the oracle risk
inequalities of the regularization methods, as well as the sampling properties
of computable solutions. Our new theoretical results are illustrated and
justified by simulation and real data examples.Comment: 39 pages, 3 figure
- …