27 research outputs found
PAC-Bayes Un-Expected Bernstein Inequality
We present a new PAC-Bayesian generalization bound. Standard bounds contain a complexity term which dominates unless , the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough ). Theoretically, unlike existing bounds, our new bound can be expected to converge to faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and excess risk bounds --- for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with taken outside its expectation
PAC-Bayes Unexpected Bernstein Inequality
We present a new PAC-Bayesian generalization bound. Standard bounds contain a \sqrt{L_n \cdot \KL/n} complexity term which dominates unless Ln, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace Ln by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough n). Theoretically, unlike existing bounds, our new bound can be expected to converge to 0 faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk\/} bounds---for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein's but with X2 taken outside its expectation
Lipschitz Adaptivity with Multiple Learning Rates in Online Learning
We aim to design adaptive online learning algorithms that take advantage of any special structure
that might be present in the learning task at hand, with as little manual tuning by the user as possible.
A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate
a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc.
A recent technique promises to overcome this difficulty by maintaining multiple learning rates in
parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization
and the Squint algorithm for prediction with expert advice. However, in both cases the user still has
to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although
this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set
too small, the methods may fail completely; but if it is taken too large, performance deteriorates
significantly. In the present work we remove this Lipschitz hyperparameter by designing new
versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this
by dynamically updating the set of active learning rates. For MetaGrad, we further improve the
computational efficiency of handling constraints on the domain of prediction, and we remove the
need to specify the number of rounds in advance
Lipschitz and comparator-norm adaptivity in online learning
We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time per round. We then generalize two prior reducti
Lipschitz and comparator-norm adaptivity in online learning
We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using O(d) time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time O(d3) per round. We then generalize two prior reductions t
Mu Insertions Are Repaired by the Double-Strand Break Repair Pathway of Escherichia coli
Mu is both a transposable element and a temperate bacteriophage. During lytic growth, it amplifies its genome by replicative transposition. During infection, it integrates into the Escherichia coli chromosome through a mechanism not requiring extensive DNA replication. In the latter pathway, the transposition intermediate is repaired by transposase-mediated resecting of the 5β² flaps attached to the ends of the incoming Mu genome, followed by filling the remaining 5 bp gaps at each end of the Mu insertion. It is widely assumed that the gaps are repaired by a gap-filling host polymerase. Using the E. coli Keio Collection to screen for mutants defective in recovery of stable Mu insertions, we show in this study that the gaps are repaired by the machinery responsible for the repair of double-strand breaks in E. coliβthe replication restart proteins PriA-DnaT and homologous recombination proteins RecABC. We discuss alternate models for recombinational repair of the Mu gaps
Green synthesis of manganese oxide nanoparticles for the electrochemical sensing of p-nitrophenol
PAC-Bayesian Bound for the Conditional Value at Risk
Conditional Value at Risk (CVAR) is a family of βcoherent risk measuresβ which
generalize the traditional mathematical expectation. Widely used in mathematical
finance, it is garnering increasing interest in machine learning, e.g., as an alternate
approach to regularization, and as a means for ensuring fairness. This paper
presents a generalization bound for learning algorithms that minimize the CVAR
of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be
small when the empirical CVAR is small. We achieve this by reducing the problem
of estimating CVAR to that of merely estimating an expectation. This then enables
us, as a by-product, to obtain concentration inequalities for CVAR even when the
random variable in question is unbounded