3,688 research outputs found
Do optimization methods in deep learning applications matter?
With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
DeepOBS: A Deep Learning Optimizer Benchmark Suite
Because the choice and tuning of the optimizer affects the speed, and
ultimately the performance of deep learning, there is significant past and
recent research in this area. Yet, perhaps surprisingly, there is no generally
agreed-upon protocol for the quantitative and reproducible evaluation of
optimization strategies for deep learning. We suggest routines and benchmarks
for stochastic optimization, with special focus on the unique aspects of deep
learning, such as stochasticity, tunability and generalization. As the primary
contribution, we present DeepOBS, a Python package of deep learning
optimization benchmarks. The package addresses key challenges in the
quantitative assessment of stochastic optimizers, and automates most steps of
benchmarking. The library includes a wide and extensible set of ready-to-use
realistic optimization problems, such as training Residual Networks for image
classification on ImageNet or character-level language prediction models, as
well as popular classics like MNIST and CIFAR-10. The package also provides
realistic baseline results for the most popular optimizers on these test
problems, ensuring a fair comparison to the competition when benchmarking new
optimizers, and without having to run costly experiments. It comes with output
back-ends that directly produce LaTeX code for inclusion in academic
publications. It supports TensorFlow and is available open source.Comment: Accepted at ICLR 2019. 9 pages, 3 figures, 2 table
An Online Decision-Theoretic Pipeline for Responder Dispatch
The problem of dispatching emergency responders to service traffic accidents,
fire, distress calls and crimes plagues urban areas across the globe. While
such problems have been extensively looked at, most approaches are offline.
Such methodologies fail to capture the dynamically changing environments under
which critical emergency response occurs, and therefore, fail to be implemented
in practice. Any holistic approach towards creating a pipeline for effective
emergency response must also look at other challenges that it subsumes -
predicting when and where incidents happen and understanding the changing
environmental dynamics. We describe a system that collectively deals with all
these problems in an online manner, meaning that the models get updated with
streaming data sources. We highlight why such an approach is crucial to the
effectiveness of emergency response, and present an algorithmic framework that
can compute promising actions for a given decision-theoretic model for
responder dispatch. We argue that carefully crafted heuristic measures can
balance the trade-off between computational time and the quality of solutions
achieved and highlight why such an approach is more scalable and tractable than
traditional approaches. We also present an online mechanism for incident
prediction, as well as an approach based on recurrent neural networks for
learning and predicting environmental features that affect responder dispatch.
We compare our methodology with prior state-of-the-art and existing dispatch
strategies in the field, which show that our approach results in a reduction in
response time with a drastic reduction in computational time.Comment: Appeared in ICCPS 201
- …