Search CORE

143,902 research outputs found

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Author: Bengio Yoshua
Gulcehre Caglar
Moczulski Marcin
Sotelo Jose
Publication venue
Publication date: 02/03/2017
Field of study

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate algorithm, which utilizes stochastic curvature information of the loss function for automatically tuning the learning rates. The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients. We further propose a new variance reduction technique to speed up the convergence. In our experiments with deep neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.Comment: IJCNN 2017 Accepted Paper, An extension of our paper, "ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

arXiv.org e-Print Archive

Crossref

High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis

Author: Nazarian Shahin
Nazemi Mahdi
Pedram Massoud
Publication venue
Publication date: 06/07/2017
Field of study

Independent Component Analysis (ICA) is a dimensionality reduction technique that can boost efficiency of machine learning models that deal with probability density functions, e.g. Bayesian neural networks. Algorithms that implement adaptive ICA converge slower than their nonadaptive counterparts, however, they are capable of tracking changes in underlying distributions of input features. This intrinsically slow convergence of adaptive methods combined with existing hardware implementations that operate at very low clock frequencies necessitate fundamental improvements in both algorithm and hardware design. This paper presents an algorithm that allows efficient hardware implementation of ICA. Compared to previous work, our FPGA implementation of adaptive ICA improves clock frequency by at least one order of magnitude and throughput by at least two orders of magnitude. Our proposed algorithm is not limited to ICA and can be used in various machine learning problems that use stochastic gradient descent optimization

arXiv.org e-Print Archive

Crossref