4 research outputs found
Combining Discrete and Neural Features for Sequence Labeling
Neural network models have recently received heated research attention in the
natural language processing community. Compared with traditional models with
discrete features, neural models have two main advantages. First, they take
low-dimensional, real-valued embedding vectors as inputs, which can be trained
over large raw data, thereby addressing the issue of feature sparsity in
discrete models. Second, deep neural networks can be used to automatically
combine input features, and including non-local features that capture semantic
patterns that cannot be expressed using discrete indicator features. As a
result, neural network models have achieved competitive accuracies compared
with the best discrete models for a range of NLP tasks.
On the other hand, manual feature templates have been carefully investigated
for most NLP tasks over decades and typically cover the most useful indicator
pattern for solving the problems. Such information can be complementary the
features automatically induced from neural networks, and therefore combining
discrete and neural features can potentially lead to better accuracy compared
with models that leverage discrete or neural features only.
In this paper, we systematically investigate the effect of discrete and
neural feature combination for a range of fundamental NLP tasks based on
sequence labeling, including word segmentation, POS tagging and named entity
recognition for Chinese and English, respectively. Our results on standard
benchmarks show that state-of-the-art neural models can give accuracies
comparable to the best discrete models in the literature for most tasks and
combing discrete and neural features unanimously yield better results.Comment: Accepted by International Conference on Computational Linguistics and
Intelligent Text Processing (CICLing) 2016, Apri
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
Named entity recognition (NER) in Chinese is essential but difficult because
of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS)
is usually considered as the first step for Chinese NER. However, models based
on word-level embeddings and lexicon features often suffer from segmentation
errors and out-of-vocabulary (OOV) words. In this paper, we investigate a
Convolutional Attention Network called CAN for Chinese NER, which consists of a
character-based convolutional neural network (CNN) with local-attention layer
and a gated recurrent unit (GRU) with global self-attention layer to capture
the information from adjacent characters and sentence contexts. Also, compared
to other models, not depending on any external resources like lexicons and
employing small size of char embeddings make our model more practical.
Extensive experimental results show that our approach outperforms
state-of-the-art methods without word embedding and external lexicon resources
on different domain datasets including Weibo, MSRA and Chinese Resume NER
dataset.Comment: This paper is accepted by NAACL-HLT 2019. The code is available at
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NE
Simplify the Usage of Lexicon in Chinese NER
Recently, many works have tried to utilizing word lexicon to augment the
performance of Chinese named entity recognition (NER). As a representative work
in this line, Lattice-LSTM \cite{zhang2018chinese} has achieved new
state-of-the-art performance on several benchmark Chinese NER datasets.
However, Lattice-LSTM suffers from a complicated model architecture, resulting
in low computational efficiency. This will heavily limit its application in
many industrial areas, which require real-time NER response. In this work, we
ask the question: if we can simplify the usage of lexicon and, at the same
time, achieve comparative performance with Lattice-LSTM for Chinese NER?
Started with this question and motivated by the idea of Lattice-LSTM, we
propose a concise but effective method to incorporate the lexicon information
into the vector representations of characters. This way, our method can avoid
introducing a complicated sequence modeling architecture to model the lexicon
information. Instead, it only needs to subtly adjust the character
representation layer of the neural sequence model. Experimental study on four
benchmark Chinese NER datasets shows that our method can achieve much faster
inference speed, comparative or better performance over Lattice-LSTM and its
follwees. It also shows that our method can be easily transferred across
difference neural architectures.Comment: Use Lexicon for Chinese NER as simply as possibl
Chinese Named Entity Recognition Augmented with Lexicon Memory
Inspired by a concept of content-addressable retrieval from cognitive
science, we propose a novel fragment-based model augmented with a lexicon-based
memory for Chinese NER, in which both the character-level and word-level
features are combined to generate better feature representations for possible
name candidates. It is observed that locating the boundary information of
entity names is useful in order to classify them into pre-defined categories.
Position-dependent features, including prefix and suffix are introduced for NER
in the form of distributed representation. The lexicon-based memory is used to
help generate such position-dependent features and deal with the problem of
out-of-vocabulary words. Experimental results showed that the proposed model,
called LEMON, achieved state-of-the-art on four datasets