21,712 research outputs found
Training Deep Networks without Learning Rates Through Coin Betting
Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms
Second-Order Stochastic Optimization for Machine Learning in Linear Time
First-order stochastic methods are the state-of-the-art in large-scale
machine learning optimization owing to efficient per-iteration complexity.
Second-order methods, while able to provide faster convergence, have been much
less explored due to the high cost of computing the second-order information.
In this paper we develop second-order stochastic methods for optimization
problems in machine learning that match the per-iteration cost of gradient
based methods, and in certain settings improve upon the overall running time
over popular first-order methods. Furthermore, our algorithm has the desirable
property of being implementable in time linear in the sparsity of the input
data
Kernel Analog Forecasting: Multiscale Test Problems
Data-driven prediction is becoming increasingly widespread as the volume of
data available grows and as algorithmic development matches this growth. The
nature of the predictions made, and the manner in which they should be
interpreted, depends crucially on the extent to which the variables chosen for
prediction are Markovian, or approximately Markovian. Multiscale systems
provide a framework in which this issue can be analyzed. In this work kernel
analog forecasting methods are studied from the perspective of data generated
by multiscale dynamical systems. The problems chosen exhibit a variety of
different Markovian closures, using both averaging and homogenization;
furthermore, settings where scale-separation is not present and the predicted
variables are non-Markovian, are also considered. The studies provide guidance
for the interpretation of data-driven prediction methods when used in practice.Comment: 30 pages, 14 figures; clarified several ambiguous parts, added
references, and a comparison with Lorenz' original method (Sec. 4.5
Convenient Multiple Directions of Stratification
This paper investigates the use of multiple directions of stratification as a
variance reduction technique for Monte Carlo simulations of path-dependent
options driven by Gaussian vectors. The precision of the method depends on the
choice of the directions of stratification and the allocation rule within each
strata. Several choices have been proposed but, even if they provide variance
reduction, their implementation is computationally intensive and not applicable
to realistic payoffs, in particular not to Asian options with barrier.
Moreover, all these previously published methods employ orthogonal directions
for multiple stratification. In this work we investigate the use of algorithms
producing convenient directions, generally non-orthogonal, combining a lower
computational cost with a comparable variance reduction. In addition, we study
the accuracy of optimal allocation in terms of variance reduction compared to
the Latin Hypercube Sampling. We consider the directions obtained by the Linear
Transformation and the Principal Component Analysis. We introduce a new
procedure based on the Linear Approximation of the explained variance of the
payoff using the law of total variance. In addition, we exhibit a novel
algorithm that permits to correctly generate normal vectors stratified along
non-orthogonal directions. Finally, we illustrate the efficiency of these
algorithms in the computation of the price of different path-dependent options
with and without barriers in the Black-Scholes and in the Cox-Ingersoll-Ross
markets.Comment: 21 pages, 11 table
- …