Search CORE

21,712 research outputs found

Training Deep Networks without Learning Rates Through Coin Betting

Author: Orabona Francesco
Tommasi Tatiana
Publication venue: Neural information processing systems foundation
Publication date: 01/01/2017
Field of study

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Author: Agarwal Naman
Bullins Brian
Hazan Elad
Publication venue
Publication date: 01/11/2017
Field of study

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data

arXiv.org e-Print Archive

Princeton University Open Access Repository

Kernel Analog Forecasting: Multiscale Test Problems

Author: Burov Dmitry
Giannakis Dimitrios
Manohar Krithika
Stuart Andrew
Publication venue
Publication date: 13/05/2020
Field of study

Data-driven prediction is becoming increasingly widespread as the volume of data available grows and as algorithmic development matches this growth. The nature of the predictions made, and the manner in which they should be interpreted, depends crucially on the extent to which the variables chosen for prediction are Markovian, or approximately Markovian. Multiscale systems provide a framework in which this issue can be analyzed. In this work kernel analog forecasting methods are studied from the perspective of data generated by multiscale dynamical systems. The problems chosen exhibit a variety of different Markovian closures, using both averaging and homogenization; furthermore, settings where scale-separation is not present and the predicted variables are non-Markovian, are also considered. The studies provide guidance for the interpretation of data-driven prediction methods when used in practice.Comment: 30 pages, 14 figures; clarified several ambiguous parts, added references, and a comparison with Lorenz' original method (Sec. 4.5

arXiv.org e-Print Archive

Caltech Authors

Convenient Multiple Directions of Stratification

Author: Jourdain Benjamin
Lapeyre Bernard
Sabino Piergiacomo
Publication venue
Publication date: 28/04/2010
Field of study

This paper investigates the use of multiple directions of stratification as a variance reduction technique for Monte Carlo simulations of path-dependent options driven by Gaussian vectors. The precision of the method depends on the choice of the directions of stratification and the allocation rule within each strata. Several choices have been proposed but, even if they provide variance reduction, their implementation is computationally intensive and not applicable to realistic payoffs, in particular not to Asian options with barrier. Moreover, all these previously published methods employ orthogonal directions for multiple stratification. In this work we investigate the use of algorithms producing convenient directions, generally non-orthogonal, combining a lower computational cost with a comparable variance reduction. In addition, we study the accuracy of optimal allocation in terms of variance reduction compared to the Latin Hypercube Sampling. We consider the directions obtained by the Linear Transformation and the Principal Component Analysis. We introduce a new procedure based on the Linear Approximation of the explained variance of the payoff using the law of total variance. In addition, we exhibit a novel algorithm that permits to correctly generate normal vectors stratified along non-orthogonal directions. Finally, we illustrate the efficiency of these algorithms in the computation of the price of different path-dependent options with and without barriers in the Black-Scholes and in the Cox-Ingersoll-Ross markets.Comment: 21 pages, 11 table

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM