1,170 research outputs found
A Practitioners' Guide to Transfer Learning for Text Classification using Convolutional Neural Networks
Transfer Learning (TL) plays a crucial role when a given dataset has
insufficient labeled examples to train an accurate model. In such scenarios,
the knowledge accumulated within a model pre-trained on a source dataset can be
transferred to a target dataset, resulting in the improvement of the target
model. Though TL is found to be successful in the realm of image-based
applications, its impact and practical use in Natural Language Processing (NLP)
applications is still a subject of research. Due to their hierarchical
architecture, Deep Neural Networks (DNN) provide flexibility and customization
in adjusting their parameters and depth of layers, thereby forming an apt area
for exploiting the use of TL. In this paper, we report the results and
conclusions obtained from extensive empirical experiments using a Convolutional
Neural Network (CNN) and try to uncover thumb rules to ensure a successful
positive transfer. In addition, we also highlight the flawed means that could
lead to a negative transfer. We explore the transferability of various layers
and describe the effect of varying hyper-parameters on the transfer
performance. Also, we present a comparison of accuracy value and model size
against state-of-the-art methods. Finally, we derive inferences from the
empirical results and provide best practices to achieve a successful positive
transfer.Comment: 9 pages, 2 figures, accepted in SDM 201
Convolutional Neural Networks for Sentence Classification
We report on a series of experiments with convolutional neural networks (CNN)
trained on top of pre-trained word vectors for sentence-level classification
tasks. We show that a simple CNN with little hyperparameter tuning and static
vectors achieves excellent results on multiple benchmarks. Learning
task-specific vectors through fine-tuning offers further gains in performance.
We additionally propose a simple modification to the architecture to allow for
the use of both task-specific and static vectors. The CNN models discussed
herein improve upon the state of the art on 4 out of 7 tasks, which include
sentiment analysis and question classification.Comment: To appear in EMNLP 201
- …