165 research outputs found
Distributed Deep Learning for Question Answering
This paper is an empirical study of the distributed deep learning for
question answering subtasks: answer selection and question classification.
Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP,
DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results
show that the distributed framework based on the message passing interface can
accelerate the convergence speed at a sublinear scale. This paper demonstrates
the importance of distributed training. For example, with 48 workers, a 24x
speedup is achievable for the answer selection task and running time is
decreased from 138.2 hours to 5.81 hours, which will increase the productivity
significantly.Comment: This paper will appear in the Proceeding of The 25th ACM
International Conference on Information and Knowledge Management (CIKM 2016),
Indianapolis, US
Do optimization methods in deep learning applications matter?
With advances in deep learning, exponential data growth and increasing model
complexity, developing efficient optimization methods are attracting much
research attention. Several implementations favor the use of Conjugate Gradient
(CG) and Stochastic Gradient Descent (SGD) as being practical and elegant
solutions to achieve quick convergence, however, these optimization processes
also present many limitations in learning across deep learning applications.
Recent research is exploring higher-order optimization functions as better
approaches, but these present very complex computational challenges for
practical use. Comparing first and higher-order optimization functions, in this
paper, our experiments reveal that Levemberg-Marquardt (LM) significantly
supersedes optimal convergence but suffers from very large processing time
increasing the training complexity of both, classification and reinforcement
learning problems. Our experiments compare off-the-shelf optimization
functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and
FlappyBird experiments.The paper presents arguments on which optimization
functions to use and further, which functions would benefit from
parallelization efforts to improve pretraining time and learning rate
convergence
- …