27 research outputs found

    PAC-Bayes Un-Expected Bernstein Inequality

    Get PDF
    We present a new PAC-Bayesian generalization bound. Standard bounds contain a Lnβ‹…KL⁑/n\sqrt{L_n \cdot \operatorname{KL}/n} complexity term which dominates unless LnL_n, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace LnL_n by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough nn). Theoretically, unlike existing bounds, our new bound can be expected to converge to 00 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and excess risk bounds --- for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2X^2 taken outside its expectation

    PAC-Bayes Unexpected Bernstein Inequality

    Get PDF
    We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation

    Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

    Get PDF
    We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance

    Lipschitz and comparator-norm adaptivity in online learning

    Get PDF
    We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d)O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3)O(d^3) per round. We then generalize two prior reducti

    Lipschitz and comparator-norm adaptivity in online learning

    Get PDF
    We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3) per round. We then generalize two prior reductions t

    Mu Insertions Are Repaired by the Double-Strand Break Repair Pathway of Escherichia coli

    Get PDF
    Mu is both a transposable element and a temperate bacteriophage. During lytic growth, it amplifies its genome by replicative transposition. During infection, it integrates into the Escherichia coli chromosome through a mechanism not requiring extensive DNA replication. In the latter pathway, the transposition intermediate is repaired by transposase-mediated resecting of the 5β€² flaps attached to the ends of the incoming Mu genome, followed by filling the remaining 5 bp gaps at each end of the Mu insertion. It is widely assumed that the gaps are repaired by a gap-filling host polymerase. Using the E. coli Keio Collection to screen for mutants defective in recovery of stable Mu insertions, we show in this study that the gaps are repaired by the machinery responsible for the repair of double-strand breaks in E. coliβ€”the replication restart proteins PriA-DnaT and homologous recombination proteins RecABC. We discuss alternate models for recombinational repair of the Mu gaps

    PAC-Bayesian Bound for the Conditional Value at Risk

    Get PDF
    Conditional Value at Risk (CVAR) is a family of β€œcoherent risk measures” which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the CVAR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVAR is small. We achieve this by reducing the problem of estimating CVAR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVAR even when the random variable in question is unbounded
    corecore