14 research outputs found

    Training Deep Networks without Learning Rates Through Coin Betting

    Get PDF
    Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

    Training Deep Networks without Learning Rates Through Coin Betting

    Get PDF
    Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

    Online Learning for Changing Environments using Coin Betting

    Full text link
    A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of log⁥(T)\sqrt{\log(T)} better than other algorithms with the same time complexity, where TT is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.Comment: submitted to a journal. arXiv admin note: substantial text overlap with arXiv:1610.0457

    Learning via Wasserstein-Based High Probability Generalisation Bounds

    Full text link
    Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) -- this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem -- hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments.Comment: Accepted to NeurIPS 202

    To Each Optimizer a Norm, To Each Norm its Generalization

    Full text link
    We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models

    Bayesian statistics and modelling

    Get PDF
    Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement. We discuss the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection. Examples of successful applications of Bayesian analysis across various research fields are provided, including in social sciences, ecology, genetics, medicine and more. We propose strategies for reproducibility and reporting standards, outlining an updated WAMBS (when to Worry and how to Avoid the Misuse of Bayesian Statistics) checklist. Finally, we outline the impact of Bayesian analysis on artificial intelligence, a major goal in the next decade

    An investigation of equine injuries in Thoroughbred flat racing in North America

    Get PDF
    The aim of this research work was to investigate and quantify the risk of fatal and fracture injury for Thoroughbreds participating in flat racing in the US and Canada so that horses at particular risk can be identified and the risk of fatal injury reduced. Risk factors associated with fatalities and fractures were identified and predictive models for both fatalities and fractures were developed and their performance was evaluated. Our analysis was based on 188,269 Thoroughbreds that raced on 89 racecourses reporting injuries to the Equine Injury Database (EID) in the US and Canada from 1st January 2009 to 31st December 2015. This included 2,493,957 race starts and 4,592,162 exercise starts. The race starts reported to the EID represented the starts for 90.0% of all official Thoroughbred racing events in the United States and Canada during the 7-year observation period. The annual average risk of fatal and fracture equine injuries for the period 2009 - 2015 was estimated and a description of the different injury types that resulted in fatalities and fractures was given, based on the cases recorded in the EID. Possible risk factors were pre-screened using univariable logistic regression models; risk factors with an association indicated by p < 0.20 were then included in a stepwise logistic regression selection process. A forward bidirectional elimination approach using Akaike's Information Criterion was utilised for the stepwise selection. We identified more than 20 risk factors that were found to be significantly associated with fatal injury (p < 0.05) and more than 20 risk factors associated with fracture injury, across the final multi-variable models. The risk factors identified are related to the horse’s previous racing history, the trainer, the race, the horse's expected performance and the horse's racing history. Five different algorithms were used to develop predictive models based on the data available from the period 2009 - 2014 for both fatal and fracture injuries. Firstly, we used Multivariable Logistic Regression, commonly used in risk factor analysis. Secondly, Improved Balanced Random Forests were developed, a machine learning algorithm based on a modification of the random forests algorithm. Because fatal injuries are extremely rare events, less than 2 instances per 1000 starts on average, balanced samples were used to develop the Random Forest model to deal with the class-imbalance problem. Furthermore, we trained an Artificial Neural Network with a single layer and two networks with deep architecture, a Deep Belief Network and a Stacked Denoising Autoencoder. As artificial neural networks and deep learning models have been successfully used to solve complex problems in a diverse field of domains we wanted to explore the possibility of using them to successfully predict equine injuries. The performance of each classifier was evaluated by calculating the Area Under the Receiver Operating Characteristic Curve (AUC), using the data available from 2015 for validation. AUC results ranged from 0.62 to 0.64 for the best performing algorithm and similar predictive results were obtained from the wide array of different models created. This is the first study to make use of the extensive information contained in the EID to identify risk factors associated with equine fatal and fracture injuries in the US and Canada for this period. To our knowledge, this is the largest retrospective observational study investigating the risk of equine fatal and fracture injuries during flat racing in the literature. This is also the first study to train logistic regression and machine learning models to predict equine injuries using such an extensive amount of data and a full year of horse racing events for prediction and evaluation. We believe the results could help identify horses at high risk of (fatal) injury on entering a race and inform the design and implementation of preventive measures aimed at minimising the number of Thoroughbreds sustaining fatal injuries during racing in North America
    corecore