56,995 research outputs found
A Robust Adaptive Stochastic Gradient Method for Deep Learning
Stochastic gradient algorithms are the main focus of large-scale optimization
problems and led to important successes in the recent advancement of the deep
learning algorithms. The convergence of SGD depends on the careful choice of
learning rate and the amount of the noise in stochastic estimates of the
gradients. In this paper, we propose an adaptive learning rate algorithm, which
utilizes stochastic curvature information of the loss function for
automatically tuning the learning rates. The information about the element-wise
curvature of the loss function is estimated from the local statistics of the
stochastic first order gradients. We further propose a new variance reduction
technique to speed up the convergence. In our experiments with deep neural
networks, we obtained better performance compared to the popular stochastic
gradient algorithms.Comment: IJCNN 2017 Accepted Paper, An extension of our paper, "ADASECANT:
Robust Adaptive Secant Method for Stochastic Gradient
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
Training Neural Networks with Stochastic Hessian-Free Optimization
Hessian-free (HF) optimization has been successfully used for training deep
autoencoders and recurrent networks. HF uses the conjugate gradient algorithm
to construct update directions through curvature-vector products that can be
computed on the same order of time as gradients. In this paper we exploit this
property and study stochastic HF with gradient and curvature mini-batches
independent of the dataset size. We modify Martens' HF for these settings and
integrate dropout, a method for preventing co-adaptation of feature detectors,
to guard against overfitting. Stochastic Hessian-free optimization gives an
intermediary between SGD and HF that achieves competitive performance on both
classification and deep autoencoder experiments.Comment: 11 pages, ICLR 201
DeepOBS: A Deep Learning Optimizer Benchmark Suite
Because the choice and tuning of the optimizer affects the speed, and
ultimately the performance of deep learning, there is significant past and
recent research in this area. Yet, perhaps surprisingly, there is no generally
agreed-upon protocol for the quantitative and reproducible evaluation of
optimization strategies for deep learning. We suggest routines and benchmarks
for stochastic optimization, with special focus on the unique aspects of deep
learning, such as stochasticity, tunability and generalization. As the primary
contribution, we present DeepOBS, a Python package of deep learning
optimization benchmarks. The package addresses key challenges in the
quantitative assessment of stochastic optimizers, and automates most steps of
benchmarking. The library includes a wide and extensible set of ready-to-use
realistic optimization problems, such as training Residual Networks for image
classification on ImageNet or character-level language prediction models, as
well as popular classics like MNIST and CIFAR-10. The package also provides
realistic baseline results for the most popular optimizers on these test
problems, ensuring a fair comparison to the competition when benchmarking new
optimizers, and without having to run costly experiments. It comes with output
back-ends that directly produce LaTeX code for inclusion in academic
publications. It supports TensorFlow and is available open source.Comment: Accepted at ICLR 2019. 9 pages, 3 figures, 2 table
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
- …