29 research outputs found
Hierarchically-Refined Label Attention Network for Sequence Labeling
CRF has been used as a powerful model for statistical sequence labeling. For
neural sequence labeling, however, BiLSTM-CRF does not always lead to better
results compared with BiLSTM-softmax local classification. This can be because
the simple Markov label transition model of CRF does not give much information
gain over strong neural encoding. For better representing label sequences, we
investigate a hierarchically-refined label attention network, which explicitly
leverages label embeddings and captures potential long-term label dependency by
giving each word incrementally refined label distributions with hierarchical
attention. Results on POS tagging, NER and CCG supertagging show that the
proposed model not only improves the overall tagging accuracy with similar
number of parameters, but also significantly speeds up the training and testing
compared to BiLSTM-CRF.Comment: EMNLP 201
Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Joint extraction of entities and relations is an important task in
information extraction. To tackle this problem, we firstly propose a novel
tagging scheme that can convert the joint extraction task to a tagging problem.
Then, based on our tagging scheme, we study different end-to-end models to
extract entities and their relations directly, without identifying entities and
relations separately. We conduct experiments on a public dataset produced by
distant supervision method and the experimental results show that the tagging
based methods are better than most of the existing pipelined and joint learning
methods. What's more, the end-to-end model proposed in this paper, achieves the
best results on the public dataset
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks
Selecting optimal parameters for a neural network architecture can often make
the difference between mediocre and state-of-the-art performance. However,
little is published which parameters and design choices should be evaluated or
selected making the correct hyperparameter optimization often a "black art that
requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate
the importance of different network design choices and hyperparameters for five
common linguistic sequence tagging tasks (POS, Chunking, NER, Entity
Recognition, and Event Detection). We evaluated over 50.000 different setups
and found, that some parameters, like the pre-trained word embeddings or the
last layer of the network, have a large impact on the performance, while other
parameters, for example the number of LSTM layers or the number of recurrent
units, are of minor importance. We give a recommendation on a configuration
that performs well among different tasks.Comment: 34 pages. 9 page version of this paper published at EMNLP 201
Recommended from our members
Structured Learning with Inexact Search: Advances in Shift-Reduce CCG Parsing
Statistical shift-reduce parsing involves the interplay of representation learning, structured learning, and inexact search. This dissertation considers approaches that tightly integrate these three elements and explores three novel models for shift-reduce CCG parsing. First, I develop a dependency model, in which the selection of shift-reduce action sequences producing a dependency structure is treated as a hidden variable; the key components of the model are a dependency oracle and a learning algorithm that integrates the dependency oracle, the structured perceptron, and beam search. Second, I present expected F-measure training and show how to derive a globally normalized RNN model, in which beam search is naturally incorporated and used in conjunction with the
objective to learn shift-reduce action sequences optimized for the final evaluation metric. Finally, I describe an LSTM model that is able to construct parser state representations incrementally by following the shift-reduce syntactic derivation process; I show expected F-measure training, which is agnostic to the underlying neural network, can be applied in this setting to obtain globally normalized greedy and beam-search LSTM shift-reduce parsers.The Carnegie Trust for the Universities of Scotland;
The Cambridge Trus