Search CORE

14 research outputs found

Training Deep Networks without Learning Rates Through Coin Betting

Author: Orabona Francesco
Tommasi Tatiana
Publication venue: Neural information processing systems foundation
Publication date: 01/01/2017
Field of study

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Training Deep Networks without Learning Rates Through Coin Betting

Author: Orabona Francesco
Tommasi Tatiana
Publication venue: I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Online Learning for Changing Environments using Coin Betting

Author: Jun Kwang-Sung
Orabona Francesco
Willett Rebecca
Wright Stephen
Publication venue
Publication date: 01/01/2017
Field of study

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of

\sqrt{\log(T)}

better than other algorithms with the same time complexity, where

T

is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.Comment: submitted to a journal. arXiv admin note: substantial text overlap with arXiv:1610.0457

arXiv.org e-Print Archive

Crossref

Learning via Wasserstein-Based High Probability Generalisation Bounds

Author: Guedj Benjamin
Haddouche Maxime
Viallard Paul
Şimşekli Umut
Publication venue
Publication date: 27/10/2023
Field of study

Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) -- this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem -- hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments.Comment: Accepted to NeurIPS 202

arXiv.org e-Print Archive

To Each Optimizer a Norm, To Each Norm its Generalization

Author: Babanezhad Reza
Gallego Jose
Lacoste-Julien Simon
Mishkin Aaron
Roux Nicolas Le
Vaswani Sharan
Publication venue
Publication date: 11/06/2020
Field of study

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models

arXiv.org e-Print Archive

Bayesian statistics and modelling

Author: Depaoli Sarah
Gelman Andrew
Kramer Bianca
Leerstoel Schoot
Methodology and statistics for the behavioural and social sciences
Märtens Kaspar
Tadesse Mahlet G.
van de Schoot R.
Vannucci Marina
Veen Duco
Willemsen Joukje
Yau Christopher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2021
Field of study

Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement. We discuss the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection. Examples of successful applications of Bayesian analysis across various research fields are provided, including in social sciences, ecology, genetics, medicine and more. We propose strategies for reproducibility and reporting standards, outlining an updated WAMBS (when to Worry and how to Avoid the Misuse of Bayesian Statistics) checklist. Finally, we outline the impact of Bayesian analysis on artificial intelligence, a major goal in the next decade

Edinburgh Research Explorer

Utrecht University Repository

Benelearn 2005: Annual Machine Learning Conference of Belgium and the Netherlands:CTIT Proceedings of the 14th annual Machine Learning Conference of Belgium and the Netherlands

Author
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/02/2005
Field of study

University of Twente Research Information

An investigation of equine injuries in Thoroughbred flat racing in North America

Author: Georgopoulos Stamatis Panagiotis
Publication venue
Publication date: 01/01/2017
Field of study

The aim of this research work was to investigate and quantify the risk of fatal and fracture injury for Thoroughbreds participating in flat racing in the US and Canada so that horses at particular risk can be identified and the risk of fatal injury reduced. Risk factors associated with fatalities and fractures were identified and predictive models for both fatalities and fractures were developed and their performance was evaluated. Our analysis was based on 188,269 Thoroughbreds that raced on 89 racecourses reporting injuries to the Equine Injury Database (EID) in the US and Canada from 1st January 2009 to 31st December 2015. This included 2,493,957 race starts and 4,592,162 exercise starts. The race starts reported to the EID represented the starts for 90.0% of all official Thoroughbred racing events in the United States and Canada during the 7-year observation period. The annual average risk of fatal and fracture equine injuries for the period 2009 - 2015 was estimated and a description of the different injury types that resulted in fatalities and fractures was given, based on the cases recorded in the EID. Possible risk factors were pre-screened using univariable logistic regression models; risk factors with an association indicated by p < 0.20 were then included in a stepwise logistic regression selection process. A forward bidirectional elimination approach using Akaike's Information Criterion was utilised for the stepwise selection. We identified more than 20 risk factors that were found to be significantly associated with fatal injury (p < 0.05) and more than 20 risk factors associated with fracture injury, across the final multi-variable models. The risk factors identified are related to the horse’s previous racing history, the trainer, the race, the horse's expected performance and the horse's racing history. Five different algorithms were used to develop predictive models based on the data available from the period 2009 - 2014 for both fatal and fracture injuries. Firstly, we used Multivariable Logistic Regression, commonly used in risk factor analysis. Secondly, Improved Balanced Random Forests were developed, a machine learning algorithm based on a modification of the random forests algorithm. Because fatal injuries are extremely rare events, less than 2 instances per 1000 starts on average, balanced samples were used to develop the Random Forest model to deal with the class-imbalance problem. Furthermore, we trained an Artificial Neural Network with a single layer and two networks with deep architecture, a Deep Belief Network and a Stacked Denoising Autoencoder. As artificial neural networks and deep learning models have been successfully used to solve complex problems in a diverse field of domains we wanted to explore the possibility of using them to successfully predict equine injuries. The performance of each classifier was evaluated by calculating the Area Under the Receiver Operating Characteristic Curve (AUC), using the data available from 2015 for validation. AUC results ranged from 0.62 to 0.64 for the best performing algorithm and similar predictive results were obtained from the wide array of different models created. This is the first study to make use of the extensive information contained in the EID to identify risk factors associated with equine fatal and fracture injuries in the US and Canada for this period. To our knowledge, this is the largest retrospective observational study investigating the risk of equine fatal and fracture injuries during flat racing in the literature. This is also the first study to train logistic regression and machine learning models to predict equine injuries using such an extensive amount of data and a full year of horse racing events for prediction and evaluation. We believe the results could help identify horses at high risk of (fatal) injury on entering a race and inform the design and implementation of preventive measures aimed at minimising the number of Thoroughbreds sustaining fatal injuries during racing in North America

Glasgow Theses Service