419 research outputs found
Character-Aware Neural Language Models
We describe a simple neural language model that relies only on
character-level inputs. Predictions are still made at the word-level. Our model
employs a convolutional neural network (CNN) and a highway network over
characters, whose output is given to a long short-term memory (LSTM) recurrent
neural network language model (RNN-LM). On the English Penn Treebank the model
is on par with the existing state-of-the-art despite having 60% fewer
parameters. On languages with rich morphology (Arabic, Czech, French, German,
Spanish, Russian), the model outperforms word-level/morpheme-level LSTM
baselines, again with fewer parameters. The results suggest that on many
languages, character inputs are sufficient for language modeling. Analysis of
word representations obtained from the character composition part of the model
reveals that the model is able to encode, from characters only, both semantic
and orthographic information.Comment: AAAI 201
Empower Sequence Labeling with Task-Aware Neural Language Model
Linguistic sequence labeling is a general modeling approach that encompasses
a variety of problems, such as part-of-speech tagging and named entity
recognition. Recent advances in neural networks (NNs) make it possible to build
reliable models without handcrafted features. However, in many cases, it is
hard to obtain sufficient annotations to train these models. In this study, we
develop a novel neural framework to extract abundant knowledge hidden in raw
texts to empower the sequence labeling task. Besides word-level knowledge
contained in pre-trained word embeddings, character-aware neural language
models are incorporated to extract character-level knowledge. Transfer learning
techniques are further adopted to mediate different components and guide the
language model towards the key knowledge. Comparing to previous methods, these
task-specific knowledge allows us to adopt a more concise model and conduct
more efficient training. Different from most transfer learning methods, the
proposed framework does not rely on any additional supervision. It extracts
knowledge from self-contained order information of training sequences.
Extensive experiments on benchmark datasets demonstrate the effectiveness of
leveraging character-level knowledge and the efficiency of co-training. For
example, on the CoNLL03 NER task, model training completes in about 6 hours on
a single GPU, reaching F1 score of 91.710.10 without using any extra
annotation.Comment: AAAI 201
Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones
Syllabification does not seem to improve word-level RNN language modeling
quality when compared to character-based segmentation. However, our best
syllable-aware language model, achieving performance comparable to the
competitive character-aware model, has 18%-33% fewer parameters and is trained
1.2-2.2 times faster.Comment: EMNLP 201
Strawman: an Ensemble of Deep Bag-of-Ngrams for Sentiment Analysis
This paper describes a builder entry, named "strawman", to the sentence-level
sentiment analysis task of the "Build It, Break It" shared task of the First
Workshop on Building Linguistically Generalizable NLP Systems. The goal of a
builder is to provide an automated sentiment analyzer that would serve as a
target for breakers whose goal is to find pairs of minimally-differing
sentences that break the analyzer.Comment: A builder entry to the sentence-level sentiment analysis task of the
"Build It, Break It" shared task of the First Workshop on Building
Linguistically Generalizable NLP System
OhioState at SemEval-2018 Task 7: Exploiting Data Augmentation for Relation Classification in Scientific Papers using Piecewise Convolutional Neural Networks
We describe our system for SemEval-2018 Shared Task on Semantic Relation
Extraction and Classification in Scientific Papers where we focus on the
Classification task. Our simple piecewise convolution neural encoder performs
decently in an end to end manner. A simple inter-task data augmentation
signifi- cantly boosts the performance of the model. Our best-performing
systems stood 8th out of 20 teams on the classification task on noisy data and
12th out of 28 teams on the classification task on clean data.Comment: To apperar in Proceedings of International Workshop on Semantic
Evaluation (SemEval-2018
- …