21,712 research outputs found

    Training Deep Networks without Learning Rates Through Coin Betting

    Get PDF
    Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms

    Second-Order Stochastic Optimization for Machine Learning in Linear Time

    Full text link
    First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data

    Kernel Analog Forecasting: Multiscale Test Problems

    Get PDF
    Data-driven prediction is becoming increasingly widespread as the volume of data available grows and as algorithmic development matches this growth. The nature of the predictions made, and the manner in which they should be interpreted, depends crucially on the extent to which the variables chosen for prediction are Markovian, or approximately Markovian. Multiscale systems provide a framework in which this issue can be analyzed. In this work kernel analog forecasting methods are studied from the perspective of data generated by multiscale dynamical systems. The problems chosen exhibit a variety of different Markovian closures, using both averaging and homogenization; furthermore, settings where scale-separation is not present and the predicted variables are non-Markovian, are also considered. The studies provide guidance for the interpretation of data-driven prediction methods when used in practice.Comment: 30 pages, 14 figures; clarified several ambiguous parts, added references, and a comparison with Lorenz' original method (Sec. 4.5

    Convenient Multiple Directions of Stratification

    Full text link
    This paper investigates the use of multiple directions of stratification as a variance reduction technique for Monte Carlo simulations of path-dependent options driven by Gaussian vectors. The precision of the method depends on the choice of the directions of stratification and the allocation rule within each strata. Several choices have been proposed but, even if they provide variance reduction, their implementation is computationally intensive and not applicable to realistic payoffs, in particular not to Asian options with barrier. Moreover, all these previously published methods employ orthogonal directions for multiple stratification. In this work we investigate the use of algorithms producing convenient directions, generally non-orthogonal, combining a lower computational cost with a comparable variance reduction. In addition, we study the accuracy of optimal allocation in terms of variance reduction compared to the Latin Hypercube Sampling. We consider the directions obtained by the Linear Transformation and the Principal Component Analysis. We introduce a new procedure based on the Linear Approximation of the explained variance of the payoff using the law of total variance. In addition, we exhibit a novel algorithm that permits to correctly generate normal vectors stratified along non-orthogonal directions. Finally, we illustrate the efficiency of these algorithms in the computation of the price of different path-dependent options with and without barriers in the Black-Scholes and in the Cox-Ingersoll-Ross markets.Comment: 21 pages, 11 table
    • …
    corecore