201 research outputs found
Chinese NER Using Lattice LSTM
We investigate a lattice-structured LSTM model for Chinese NER, which encodes
a sequence of input characters as well as all potential words that match a
lexicon. Compared with character-based methods, our model explicitly leverages
word and word sequence information. Compared with word-based methods, lattice
LSTM does not suffer from segmentation errors. Gated recurrent cells allow our
model to choose the most relevant characters and words from a sentence for
better NER results. Experiments on various datasets show that lattice LSTM
outperforms both word-based and character-based LSTM baselines, achieving the
best results.Comment: Accepted at ACL 2018 as Long pape
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
We investigate a lattice LSTM network for Chinese word segmentation (CWS) to
utilize words or subwords. It integrates the character sequence features with
all subsequences information matched from a lexicon. The matched subsequences
serve as information shortcut tunnels which link their start and end characters
directly. Gated units are used to control the contribution of multiple input
links. Through formula derivation and comparison, we show that the lattice LSTM
is an extension of the standard LSTM with the ability to take multiple inputs.
Previous lattice LSTM model takes word embeddings as the lexicon input, we
prove that subword encoding can give the comparable performance and has the
benefit of not relying on any external segmentor. The contribution of lattice
LSTM comes from both lexicon and pretrained embeddings information, we find
that the lexicon information contributes more than the pretrained embeddings
information through controlled experiments. Our experiments show that the
lattice structure with subword encoding gives competitive or better results
with previous state-of-the-art methods on four segmentation benchmarks.
Detailed analyses are conducted to compare the performance of word encoding and
subword encoding in lattice LSTM. We also investigate the performance of
lattice LSTM structure under different circumstances and when this model works
or fails.Comment: 8 page
Glyce: Glyph-vectors for Chinese Character Representations
It is intuitive that NLP tasks for logographic languages like Chinese should
benefit from the use of the glyph information in those languages. However, due
to the lack of rich pictographic evidence in glyphs and the weak generalization
ability of standard computer vision models on character data, an effective way
to utilize the glyph information remains to be found. In this paper, we address
this gap by presenting Glyce, the glyph-vectors for Chinese character
representations. We make three major innovations: (1) We use historical Chinese
scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to
enrich the pictographic evidence in characters; (2) We design CNN structures
(called tianzege-CNN) tailored to Chinese character image processing; and (3)
We use image-classification as an auxiliary task in a multi-task learning setup
to increase the model's ability to generalize. We show that glyph-based models
are able to consistently outperform word/char ID-based models in a wide range
of Chinese NLP tasks. We are able to set new state-of-the-art results for a
variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair
classification, single sentence classification tasks, dependency parsing, and
semantic role labeling. For example, the proposed model achieves an F1 score of
80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost
perfect accuracy of 99.8\% on the Fudan corpus for text classification. Code
found at https://github.com/ShannonAI/glyce.Comment: Accepted by NeurIPS 201
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
Named entity recognition (NER) in Chinese is essential but difficult because
of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS)
is usually considered as the first step for Chinese NER. However, models based
on word-level embeddings and lexicon features often suffer from segmentation
errors and out-of-vocabulary (OOV) words. In this paper, we investigate a
Convolutional Attention Network called CAN for Chinese NER, which consists of a
character-based convolutional neural network (CNN) with local-attention layer
and a gated recurrent unit (GRU) with global self-attention layer to capture
the information from adjacent characters and sentence contexts. Also, compared
to other models, not depending on any external resources like lexicons and
employing small size of char embeddings make our model more practical.
Extensive experimental results show that our approach outperforms
state-of-the-art methods without word embedding and external lexicon resources
on different domain datasets including Weibo, MSRA and Chinese Resume NER
dataset.Comment: This paper is accepted by NAACL-HLT 2019. The code is available at
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NE
Neural Entity Reasoner for Global Consistency in NER
We propose Neural Entity Reasoner (NE-Reasoner), a framework to introduce
global consistency of recognized entities into Neural Reasoner over Named
Entity Recognition (NER) task. Given an input sentence, the NE-Reasoner layer
can infer over multiple entities to increase the global consistency of output
labels, which then be transfered into entities for the input of next layer.
NE-Reasoner inherits and develops some features from Neural Reasoner 1) a
symbolic memory, allowing it to exchange entities between layers. 2) the
specific interaction-pooling mechanism, allowing it to connect each local word
to multiple global entities, and 3) the deep architecture, allowing it to
bootstrap the recognized entity set from coarse to fine. Like human beings,
NE-Reasoner is able to accommodate ambiguous words and Name Entities that
rarely or never met before. Despite the symbolic information the model
introduced, NE-Reasoner can still be trained effectively in an end-to-end
manner via parameter sharing strategy. NE-Reasoner can outperform conventional
NER models in most cases on both English and Chinese NER datasets. For example,
it achieves state-of-art on CoNLL-2003 English NER dataset.Comment: 8 pages, 3 figures, submitted to AAAI201
Simplify the Usage of Lexicon in Chinese NER
Recently, many works have tried to utilizing word lexicon to augment the
performance of Chinese named entity recognition (NER). As a representative work
in this line, Lattice-LSTM \cite{zhang2018chinese} has achieved new
state-of-the-art performance on several benchmark Chinese NER datasets.
However, Lattice-LSTM suffers from a complicated model architecture, resulting
in low computational efficiency. This will heavily limit its application in
many industrial areas, which require real-time NER response. In this work, we
ask the question: if we can simplify the usage of lexicon and, at the same
time, achieve comparative performance with Lattice-LSTM for Chinese NER?
Started with this question and motivated by the idea of Lattice-LSTM, we
propose a concise but effective method to incorporate the lexicon information
into the vector representations of characters. This way, our method can avoid
introducing a complicated sequence modeling architecture to model the lexicon
information. Instead, it only needs to subtly adjust the character
representation layer of the neural sequence model. Experimental study on four
benchmark Chinese NER datasets shows that our method can achieve much faster
inference speed, comparative or better performance over Lattice-LSTM and its
follwees. It also shows that our method can be easily transferred across
difference neural architectures.Comment: Use Lexicon for Chinese NER as simply as possibl
Porous Lattice-based Transformer Encoder for Chinese NER
Incorporating lattices into character-level Chinese named entity recognition
is an effective method to exploit explicit word information. Recent works
extend recurrent and convolutional neural networks to model lattice inputs.
However, due to the DAG structure or the variable-sized potential word set for
lattice inputs, these models prevent the convenient use of batched computation,
resulting in serious inefficient. In this paper, we propose a porous
lattice-based transformer encoder for Chinese named entity recognition, which
is capable to better exploit the GPU parallelism and batch the computation
owing to the mask mechanism in transformer. We first investigate the
lattice-aware self-attention coupled with relative position representations to
explore effective word information in the lattice structure. Besides, to
strengthen the local dependencies among neighboring tokens, we propose a novel
porous structure during self-attentional computation processing, in which every
two non-neighboring tokens are connected through a shared pivot node.
Experimental results on four datasets show that our model performs up to 9.47
times faster than state-of-the-art models, while is roughly on a par with its
performance. The source code of this paper can be obtained from
https://github.com/xxx/xxx.Comment: 9 pages, 4 figure
FGN: Fusion Glyph Network for Chinese Named Entity Recognition
Chinese NER is a challenging task. As pictographs, Chinese characters contain
latent glyph information, which is often overlooked. In this paper, we propose
the FGN, Fusion Glyph Network for Chinese NER. Except for adding glyph
information, this method may also add extra interactive information with the
fusion mechanism. The major innovations of FGN include: (1) a novel CNN
structure called CGS-CNN is proposed to capture both glyph information and
interactive information between glyphs from neighboring characters. (2) we
provide a method with sliding window and Slice-Attention to fuse the BERT
representation and glyph representation for a character, which may capture
potential interactive knowledge between context and glyph. Experiments are
conducted on four NER datasets, showing that FGN with LSTM-CRF as tagger
achieves new state-of-the-arts performance for Chinese NER. Further, more
experiments are conducted to investigate the influences of various components
and settings in FGN
Learning Task-specific Representation for Novel Words in Sequence Labeling
Word representation is a key component in neural-network-based sequence
labeling systems. However, representations of unseen or rare words trained on
the end task are usually poor for appreciable performance. This is commonly
referred to as the out-of-vocabulary (OOV) problem. In this work, we address
the OOV problem in sequence labeling using only training data of the task. To
this end, we propose a novel method to predict representations for OOV words
from their surface-forms (e.g., character sequence) and contexts. The method is
specifically designed to avoid the error propagation problem suffered by
existing approaches in the same paradigm. To evaluate its effectiveness, we
performed extensive empirical studies on four part-of-speech tagging (POS)
tasks and four named entity recognition (NER) tasks. Experimental results show
that the proposed method can achieve better or competitive performance on the
OOV problem compared with existing state-of-the-art methods.Comment: This work has been accepted by IJCAI 201
SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER
Although character-based models using lexicon have achieved promising results
for Chinese named entity recognition (NER) task, some lexical words would
introduce erroneous information due to wrongly matched words. Existing
researches proposed many strategies to integrate lexicon knowledge. However,
they performed with simple first-order lexicon knowledge, which provided
insufficient word information and still faced the challenge of matched word
boundary conflicts; or explored the lexicon knowledge with graph where
higher-order information introducing negative words may disturb the
identification. To alleviate the above limitations, we present new insight into
second-order lexicon knowledge (SLK) of each character in the sentence to
provide more lexical word information including semantic and word boundary
features. Based on these, we propose a SLK-based model with a novel strategy to
integrate the above lexicon knowledge. The proposed model can exploit more
discernible lexical words information with the help of global context.
Experimental results on three public datasets demonstrate the validity of SLK.
The proposed model achieves more excellent performance than the
state-of-the-art comparison methods.Comment: 5 pages, The work is accepted by SEKE202
- …