Search CORE

29 research outputs found

Hierarchically-Refined Label Attention Network for Sequence Labeling

Author: Cui Leyang
Zhang Yue
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

CRF has been used as a powerful model for statistical sequence labeling. For neural sequence labeling, however, BiLSTM-CRF does not always lead to better results compared with BiLSTM-softmax local classification. This can be because the simple Markov label transition model of CRF does not give much information gain over strong neural encoding. For better representing label sequences, we investigate a hierarchically-refined label attention network, which explicitly leverages label embeddings and captures potential long-term label dependency by giving each word incrementally refined label distributions with hierarchical attention. Results on POS tagging, NER and CCG supertagging show that the proposed model not only improves the overall tagging accuracy with similar number of parameters, but also significantly speeds up the training and testing compared to BiLSTM-CRF.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Author: Bao Hongyun
Hao Yuexing
Wang Feng
Xu Bo
Zheng Suncong
Zhou Peng
Publication venue
Publication date: 01/01/2017
Field of study

Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What's more, the end-to-end model proposed in this paper, achieves the best results on the public dataset

arXiv.org e-Print Archive

Crossref

Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

Author: Gurevych Iryna
Reimers Nils
Publication venue
Publication date: 01/07/2017
Field of study

Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance. However, little is published which parameters and design choices should be evaluated or selected making the correct hyperparameter optimization often a "black art that requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks (POS, Chunking, NER, Entity Recognition, and Event Detection). We evaluated over 50.000 different setups and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or the number of recurrent units, are of minor importance. We give a recommendation on a configuration that performs well among different tasks.Comment: 34 pages. 9 page version of this paper published at EMNLP 201

arXiv.org e-Print Archive

TUbiblio

Recommended from our members

Structured Learning with Inexact Search: Advances in Shift-Reduce CCG Parsing

Author: Xu Wenduan
Publication venue: University of Cambridge
Publication date: 07/12/2017
Field of study

Statistical shift-reduce parsing involves the interplay of representation learning, structured learning, and inexact search. This dissertation considers approaches that tightly integrate these three elements and explores three novel models for shift-reduce CCG parsing. First, I develop a dependency model, in which the selection of shift-reduce action sequences producing a dependency structure is treated as a hidden variable; the key components of the model are a dependency oracle and a learning algorithm that integrates the dependency oracle, the structured perceptron, and beam search. Second, I present expected F-measure training and show how to derive a globally normalized RNN model, in which beam search is naturally incorporated and used in conjunction with the objective to learn shift-reduce action sequences optimized for the final evaluation metric. Finally, I describe an LSTM model that is able to construct parser state representations incrementally by following the shift-reduce syntactic derivation process; I show expected F-measure training, which is agnostic to the underlying neural network, can be applied in this setting to obtain globally normalized greedy and beam-search LSTM shift-reduce parsers.The Carnegie Trust for the Universities of Scotland; The Cambridge Trus

Apollo (Cambridge)