Search CORE

1,170 research outputs found

A Practitioners' Guide to Transfer Learning for Text Classification using Convolutional Neural Networks

Author: Mathur Gaurav
Nair Shivashankar B.
Semwal Tushar
Yenigalla Promod
Publication venue
Publication date: 19/01/2018
Field of study

Transfer Learning (TL) plays a crucial role when a given dataset has insufficient labeled examples to train an accurate model. In such scenarios, the knowledge accumulated within a model pre-trained on a source dataset can be transferred to a target dataset, resulting in the improvement of the target model. Though TL is found to be successful in the realm of image-based applications, its impact and practical use in Natural Language Processing (NLP) applications is still a subject of research. Due to their hierarchical architecture, Deep Neural Networks (DNN) provide flexibility and customization in adjusting their parameters and depth of layers, thereby forming an apt area for exploiting the use of TL. In this paper, we report the results and conclusions obtained from extensive empirical experiments using a Convolutional Neural Network (CNN) and try to uncover thumb rules to ensure a successful positive transfer. In addition, we also highlight the flawed means that could lead to a negative transfer. We explore the transferability of various layers and describe the effect of varying hyper-parameters on the transfer performance. Also, we present a comparison of accuracy value and model size against state-of-the-art methods. Finally, we derive inferences from the empirical results and provide best practices to achieve a successful positive transfer.Comment: 9 pages, 2 figures, accepted in SDM 201

arXiv.org e-Print Archive

Crossref

Convolutional Neural Networks for Sentence Classification

Author: Kim Yoon
Publication venue
Publication date: 01/01/2014
Field of study

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.Comment: To appear in EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref