1,243 research outputs found

    An Exponential Efron-Stein Inequality for Lq Stable Learning Rules

    Full text link
    There is accumulating evidence in the literature that stability of learning algorithms is a key characteristic that permits a learning algorithm to generalize. Despite various insightful results in this direction, there seems to be an overlooked dichotomy in the type of stability-based generalization bounds we have in the literature. On one hand, the literature seems to suggest that exponential generalization bounds for the estimated risk, which are optimal, can be only obtained through stringent, distribution independent and computationally intractable notions of stability such as uniform stability. On the other hand, it seems that weaker notions of stability such as hypothesis stability, although it is distribution dependent and more amenable to computation, can only yield polynomial generalization bounds for the estimated risk, which are suboptimal. In this paper, we address the gap between these two regimes of results. In particular, the main question we address here is \emph{whether it is possible to derive exponential generalization bounds for the estimated risk using a notion of stability that is computationally tractable and distribution dependent, but weaker than uniform stability. Using recent advances in concentration inequalities, and using a notion of stability that is weaker than uniform stability but distribution dependent and amenable to computation, we derive an exponential tail bound for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimated risk is expressed in terms of either the resubstitution estimate (empirical error), or the deleted (or, leave-one-out) estimate. As an illustration, we derive exponential tail bounds for ridge regression with unbounded responses, where we show how stability changes with the tail behavior of the response variables.Comment: Additional text and appendices that were not included in the PMLR (ALT'19) proceedings are now included in this versio

    Generalization Bounds for Uniformly Stable Algorithms

    Full text link
    Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss function with range bounded in [0,1][0,1], the generalization error of a γ\gamma-uniformly stable learning algorithm on nn samples is known to be within O((γ+1/n)nlog(1/δ))O((\gamma +1/n) \sqrt{n \log(1/\delta)}) of the empirical error with probability at least 1δ1-\delta. Unfortunately, this bound does not lead to meaningful generalization bounds in many common settings where γ1/n\gamma \geq 1/\sqrt{n}. At the same time the bound is known to be tight only when γ=O(1/n)\gamma = O(1/n). We substantially improve generalization bounds for uniformly stable algorithms without making any additional assumptions. First, we show that the bound in this setting is O((γ+1/n)log(1/δ))O(\sqrt{(\gamma + 1/n) \log(1/\delta)}) with probability at least 1δ1-\delta. In addition, we prove a tight bound of O(γ2+1/n)O(\gamma^2 + 1/n) on the second moment of the estimation error. The best previous bound on the second moment is O(γ+1/n)O(\gamma + 1/n). Our proofs are based on new analysis techniques and our results imply substantially stronger generalization guarantees for several well-studied algorithms.Comment: Appeared in Neural Information Processing Systems (NeurIPS), 201

    Optimism-Based Adaptive Regulation of Linear-Quadratic Systems

    Full text link
    The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of the true parameter are employed in the literature. A number of asymptotic results have been established, but their finite time counterparts are few, with important restrictions. This study establishes results for the worst-case regret of optimism-based adaptive policies. The presented high probability upper bounds are optimal up to logarithmic factors. The non-asymptotic analysis of this work requires very mild assumptions; (i) stabilizability of the system's dynamics, and (ii) limiting the degree of heaviness of the noise distribution. To establish such bounds, certain novel techniques are developed to comprehensively address the probabilistic behavior of dependent random matrices with heavy-tailed distributions.Comment: 28 page

    A PTAS for p\ell_p-Low Rank Approximation

    Full text link
    A number of recent works have studied algorithms for entrywise p\ell_p-low rank approximation, namely, algorithms which given an n×dn \times d matrix AA (with ndn \geq d), output a rank-kk matrix BB minimizing ABpp=i,jAi,jBi,jp\|A-B\|_p^p=\sum_{i,j}|A_{i,j}-B_{i,j}|^p when p>0p > 0; and AB0=i,j[Ai,jBi,j]\|A-B\|_0=\sum_{i,j}[A_{i,j}\neq B_{i,j}] for p=0p=0. On the algorithmic side, for p(0,2)p \in (0,2), we give the first (1+ϵ)(1+\epsilon)-approximation algorithm running in time npoly(k/ϵ)n^{\text{poly}(k/\epsilon)}. Further, for p=0p = 0, we give the first almost-linear time approximation scheme for what we call the Generalized Binary 0\ell_0-Rank-kk problem. Our algorithm computes (1+ϵ)(1+\epsilon)-approximation in time (1/ϵ)2O(k)/ϵ2nd1+o(1)(1/\epsilon)^{2^{O(k)}/\epsilon^{2}} \cdot nd^{1+o(1)}. On the hardness of approximation side, for p(1,2)p \in (1,2), assuming the Small Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show that there exists δ:=δ(α)>0\delta := \delta(\alpha) > 0 such that the entrywise p\ell_p-Rank-kk problem has no α\alpha-approximation algorithm running in time 2kδ2^{k^{\delta}}.Comment: Accepted at SODA'19, 61 page

    Make Up Your Mind: The Price of Online Queries in Differential Privacy

    Full text link
    We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line

    Input Perturbations for Adaptive Control and Learning

    Full text link
    This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds uniformly over time. Further, it discusses specific settings where such greedy policies attain the information theoretic lower bound of logarithmic regret. To establish the results, recent advances on self-normalized martingales together with a novel method of policy decomposition are leveraged

    Learning Linear-Quadratic Regulators Efficiently with only T\sqrt{T} Regret

    Full text link
    We present the first computationally-efficient algorithm with O~(T)\widetilde O(\sqrt{T}) regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesv\'ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018)

    Cross-validation Confidence Intervals for Test Error

    Full text link
    This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for kk-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller kk-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS 2020); 40 pages, 15 figure

    Toward Better Generalization Bounds with Locally Elastic Stability

    Full text link
    Algorithmic stability is a key characteristic to ensure the generalization ability of a learning algorithm. Among different notions of stability, \emph{uniform stability} is arguably the most popular one, which yields exponential generalization bounds. However, uniform stability only considers the worst-case loss change (or so-called sensitivity) by removing a single data point, which is distribution-independent and therefore undesirable. There are many cases that the worst-case sensitivity of the loss is much larger than the average sensitivity taken over the single data point that is removed, especially in some advanced models such as random feature models or neural networks. Many previous works try to mitigate the distribution independent issue by proposing weaker notions of stability, however, they either only yield polynomial bounds or the bounds derived do not vanish as sample size goes to infinity. Given that, we propose \emph{locally elastic stability} as a weaker and distribution-dependent stability notion, which still yields exponential generalization bounds. We further demonstrate that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability in many situations by revisiting the examples of bounded support vector machines, regularized least square regressions, and stochastic gradient descent.Comment: Published in ICML 202

    Asymptotic equivalence of regularization methods in thresholded parameter space

    Full text link
    High-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular classes of convex ones and concave ones. A long debate has been on whether one class dominates the other, an important question both in theory and to practitioners. In this paper, we characterize the asymptotic equivalence of regularization methods, with general penalty functions, in a thresholded parameter space under the generalized linear model setting, where the dimensionality can grow up to exponentially with the sample size. To assess their performance, we establish the oracle inequalities, as in Bickel, Ritov and Tsybakov (2009), of the global minimizer for these methods under various prediction and variable selection losses. These results reveal an interesting phase transition phenomenon. For polynomially growing dimensionality, the L1L_1-regularization method of Lasso and concave methods are asymptotically equivalent, having the same convergence rates in the oracle inequalities. For exponentially growing dimensionality, concave methods are asymptotically equivalent but have faster convergence rates than the Lasso. We also establish a stronger property of the oracle risk inequalities of the regularization methods, as well as the sampling properties of computable solutions. Our new theoretical results are illustrated and justified by simulation and real data examples.Comment: 39 pages, 3 figure
    corecore