7,282 research outputs found

    Online Multiple Kernel Learning for Structured Prediction

    Full text link
    Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that drawback. We show regret, convergence, and generalization bounds for the proposed method. Experiments on handwriting recognition and dependency parsing testify for the successfulness of the approach

    Differentially Private Online Learning

    Full text link
    In this paper, we consider the problem of preserving privacy in the online learning setting. We study the problem in the online convex programming (OCP) framework---a popular online learning setting with several interesting theoretical and practical implications---while using differential privacy as the formal privacy measure. For this problem, we distill two critical attributes that a private OCP algorithm should have in order to provide reasonable privacy as well as utility guarantees: 1) linearly decreasing sensitivity, i.e., as new data points arrive their effect on the learning model decreases, 2) sub-linear regret bound---regret bound is a popular goodness/utility measure of an online learning algorithm. Given an OCP algorithm that satisfies these two conditions, we provide a general framework to convert the given algorithm into a privacy preserving OCP algorithm with good (sub-linear) regret. We then illustrate our approach by converting two popular online learning algorithms into their differentially private variants while guaranteeing sub-linear regret (O(T)O(\sqrt{T})). Next, we consider the special case of online linear regression problems, a practically important class of online learning problems, for which we generalize an approach by Dwork et al. to provide a differentially private algorithm with just O(log1.5T)O(\log^{1.5} T) regret. Finally, we show that our online learning framework can be used to provide differentially private algorithms for offline learning as well. For the offline learning problem, our approach obtains better error bounds as well as can handle larger class of problems than the existing state-of-the-art methods Chaudhuri et al

    The Case for Full-Matrix Adaptive Regularization

    Full text link
    Adaptive regularization methods come in diagonal and full-matrix variants. However, only the former have enjoyed widespread adoption in training large-scale deep models. This is due to the computational overhead of manipulating a full matrix in high dimension. In this paper, we show how to make full-matrix adaptive regularization practical and useful. We present GGT, a truly scalable full-matrix adaptive optimizer. At the heart of our algorithm is an efficient method for computing the inverse square root of a low-rank matrix. We show that GGT converges to first-order local minima, providing the first rigorous theoretical analysis of adaptive regularization in non-convex optimization. In preliminary experiments, GGT trains faster across a variety of synthetic tasks and standard deep learning benchmarks

    A Simple Analysis for Exp-concave Empirical Minimization with Arbitrary Convex Regularizer

    Full text link
    In this paper, we present a simple analysis of {\bf fast rates} with {\it high probability} of {\bf empirical minimization} for {\it stochastic composite optimization} over a finite-dimensional bounded convex set with exponential concave loss functions and an arbitrary convex regularization. To the best of our knowledge, this result is the first of its kind. As a byproduct, we can directly obtain the fast rate with {\it high probability} for exponential concave empirical risk minimization with and without any convex regularization, which not only extends existing results of empirical risk minimization but also provides a unified framework for analyzing exponential concave empirical risk minimization with and without {\it any} convex regularization. Our proof is very simple only exploiting the covering number of a finite-dimensional bounded set and a concentration inequality of random vectors

    On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions

    Full text link
    In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al (COLT 2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.Comment: To appear in proceedings of the 30th International Conference on Machine Learning (ICML 2013

    Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

    Full text link
    Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. They have recently received increasing attention in the field of optimization for developing optimization algorithms with fast convergence. However, the studies of EBC in statistical learning are hitherto still limited. The main contributions of this paper are two-fold. First, we develop fast and intermediate rates of empirical risk minimization (ERM) under EBC for risk minimization with Lipschitz continuous, and smooth convex random functions. Second, we establish fast and intermediate rates of an efficient stochastic approximation (SA) algorithm for risk minimization with Lipschitz continuous random functions, which requires only one pass of nn samples and adapts to EBC. For both approaches, the convergence rates span a full spectrum between O~(1/n)\widetilde O(1/\sqrt{n}) and O~(1/n)\widetilde O(1/n) depending on the power constant in EBC, and could be even faster than O(1/n)O(1/n) in special cases for ERM. Moreover, these convergence rates are automatically adaptive without using any knowledge of EBC. Overall, this work not only strengthens the understanding of ERM for statistical learning but also brings new fast stochastic algorithms for solving a broad range of statistical learning problems

    Regularization Techniques for Learning with Matrices

    Full text link
    There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge). This work describes and analyzes a systematic method for constructing such matrix-based, regularization methods. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on the known duality fact: that a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and kernel learning

    Exploiting Problem Structure in Optimization under Uncertainty via Online Convex Optimization

    Full text link
    In this paper, we consider two paradigms that are developed to account for uncertainty in optimization models: robust optimization (RO) and joint estimation-optimization (JEO). We examine recent developments on efficient and scalable iterative first-order methods for these problems, and show that these iterative methods can be viewed through the lens of online convex optimization (OCO). The standard OCO framework has seen much success for its ability to handle decision-making in dynamic, uncertain, and even adversarial environments. Nevertheless, our applications of interest present further flexibility in OCO via three simple modifications to standard OCO assumptions: we introduce two new concepts of weighted regret and online saddle point problems and study the possibility of making lookahead (anticipatory) decisions. Our analyses demonstrate that these flexibilities introduced into the OCO framework have significant consequences whenever they are applicable. For example, in the strongly convex case, minimizing unweighted regret has a proven optimal bound of O(log(T)/T)O(\log(T)/T), whereas we show that a bound of O(1/T)O(1/T) is possible when we consider weighted regret. Similarly, for the smooth case, considering 11-lookahead decisions results in a O(1/T)O(1/T) bound, compared to O(1/T)O(1/\sqrt{T}) in the standard OCO setting. Consequently, these OCO tools are instrumental in exploiting structural properties of functions and resulting in improved convergence rates for RO and JEO. In certain cases, our results for RO and JEO match the best known or optimal rates in the corresponding problem classes without data uncertainty

    Provable Guarantees for Gradient-Based Meta-Learning

    Full text link
    We study the problem of meta-learning through the lens of online convex optimization, developing a meta-algorithm bridging the gap between popular gradient-based meta-learning and classical regularization-based multi-task transfer methods. Our method is the first to simultaneously satisfy good sample efficiency guarantees in the convex setting, with generalization bounds that improve with task-similarity, while also being computationally scalable to modern deep learning architectures and the many-task setting. Despite its simplicity, the algorithm matches, up to a constant factor, a lower bound on the performance of any such parameter-transfer method under natural task similarity assumptions. We use experiments in both convex and deep learning settings to verify and demonstrate the applicability of our theory.Comment: ICML 201

    On Distributed Online Classification in the Midst of Concept Drifts

    Full text link
    In this work, we analyze the generalization ability of distributed online learning algorithms under stationary and non-stationary environments. We derive bounds for the excess-risk attained by each node in a connected network of learners and study the performance advantage that diffusion strategies have over individual non-cooperative processing. We conduct extensive simulations to illustrate the results.Comment: 19 pages, 14 figures, to appear in Neurocomputing, 201
    corecore