11,675 research outputs found
Hyperparameter tuning for deep learning in natural language processing
Deep Neural Networks have advanced rapidly over the past several years. However, it still seems like a black art for many people to make use of them efficiently. The reason for this complexity is that obtaining a consistent and outstanding result from a deep architecture requires optimizing many parameters known as hyperparameters. Hyperparameter tuning is an essential task in deep learning, which can make significant changes in network performance. This paper is the essence of over 3000 GPU hours on optimizing a network for a text classification task on a wide array of hyperparameters. We provide a list of hyperparameters to tune in addition to their tuning impact on the network performance. The hope is that such a listing will provide the interested researchers a mean to prioritize their efforts and to modify their deep architecture for getting the best performance with the least effort
DeepOBS: A Deep Learning Optimizer Benchmark Suite
Because the choice and tuning of the optimizer affects the speed, and
ultimately the performance of deep learning, there is significant past and
recent research in this area. Yet, perhaps surprisingly, there is no generally
agreed-upon protocol for the quantitative and reproducible evaluation of
optimization strategies for deep learning. We suggest routines and benchmarks
for stochastic optimization, with special focus on the unique aspects of deep
learning, such as stochasticity, tunability and generalization. As the primary
contribution, we present DeepOBS, a Python package of deep learning
optimization benchmarks. The package addresses key challenges in the
quantitative assessment of stochastic optimizers, and automates most steps of
benchmarking. The library includes a wide and extensible set of ready-to-use
realistic optimization problems, such as training Residual Networks for image
classification on ImageNet or character-level language prediction models, as
well as popular classics like MNIST and CIFAR-10. The package also provides
realistic baseline results for the most popular optimizers on these test
problems, ensuring a fair comparison to the competition when benchmarking new
optimizers, and without having to run costly experiments. It comes with output
back-ends that directly produce LaTeX code for inclusion in academic
publications. It supports TensorFlow and is available open source.Comment: Accepted at ICLR 2019. 9 pages, 3 figures, 2 table
A Comparative Study on Regularization Strategies for Embedding-based Neural Networks
This paper aims to compare different regularization strategies to address a
common phenomenon, severe overfitting, in embedding-based neural networks for
NLP. We chose two widely studied neural models and tasks as our testbed. We
tried several frequently applied or newly proposed regularization strategies,
including penalizing weights (embeddings excluded), penalizing embeddings,
re-embedding words, and dropout. We also emphasized on incremental
hyperparameter tuning, and combining different regularizations. The results
provide a picture on tuning hyperparameters for neural NLP models.Comment: EMNLP '1
- …