201 research outputs found

    Chinese NER Using Lattice LSTM

    Full text link
    We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.Comment: Accepted at ACL 2018 as Long pape

    Subword Encoding in Lattice LSTM for Chinese Word Segmentation

    Full text link
    We investigate a lattice LSTM network for Chinese word segmentation (CWS) to utilize words or subwords. It integrates the character sequence features with all subsequences information matched from a lexicon. The matched subsequences serve as information shortcut tunnels which link their start and end characters directly. Gated units are used to control the contribution of multiple input links. Through formula derivation and comparison, we show that the lattice LSTM is an extension of the standard LSTM with the ability to take multiple inputs. Previous lattice LSTM model takes word embeddings as the lexicon input, we prove that subword encoding can give the comparable performance and has the benefit of not relying on any external segmentor. The contribution of lattice LSTM comes from both lexicon and pretrained embeddings information, we find that the lexicon information contributes more than the pretrained embeddings information through controlled experiments. Our experiments show that the lattice structure with subword encoding gives competitive or better results with previous state-of-the-art methods on four segmentation benchmarks. Detailed analyses are conducted to compare the performance of word encoding and subword encoding in lattice LSTM. We also investigate the performance of lattice LSTM structure under different circumstances and when this model works or fails.Comment: 8 page

    Glyce: Glyph-vectors for Chinese Character Representations

    Full text link
    It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model's ability to generalize. We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new state-of-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8\% on the Fudan corpus for text classification. Code found at https://github.com/ShannonAI/glyce.Comment: Accepted by NeurIPS 201

    CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

    Full text link
    Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. However, models based on word-level embeddings and lexicon features often suffer from segmentation errors and out-of-vocabulary (OOV) words. In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. Also, compared to other models, not depending on any external resources like lexicons and employing small size of char embeddings make our model more practical. Extensive experimental results show that our approach outperforms state-of-the-art methods without word embedding and external lexicon resources on different domain datasets including Weibo, MSRA and Chinese Resume NER dataset.Comment: This paper is accepted by NAACL-HLT 2019. The code is available at https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NE

    Neural Entity Reasoner for Global Consistency in NER

    Full text link
    We propose Neural Entity Reasoner (NE-Reasoner), a framework to introduce global consistency of recognized entities into Neural Reasoner over Named Entity Recognition (NER) task. Given an input sentence, the NE-Reasoner layer can infer over multiple entities to increase the global consistency of output labels, which then be transfered into entities for the input of next layer. NE-Reasoner inherits and develops some features from Neural Reasoner 1) a symbolic memory, allowing it to exchange entities between layers. 2) the specific interaction-pooling mechanism, allowing it to connect each local word to multiple global entities, and 3) the deep architecture, allowing it to bootstrap the recognized entity set from coarse to fine. Like human beings, NE-Reasoner is able to accommodate ambiguous words and Name Entities that rarely or never met before. Despite the symbolic information the model introduced, NE-Reasoner can still be trained effectively in an end-to-end manner via parameter sharing strategy. NE-Reasoner can outperform conventional NER models in most cases on both English and Chinese NER datasets. For example, it achieves state-of-art on CoNLL-2003 English NER dataset.Comment: 8 pages, 3 figures, submitted to AAAI201

    Simplify the Usage of Lexicon in Chinese NER

    Full text link
    Recently, many works have tried to utilizing word lexicon to augment the performance of Chinese named entity recognition (NER). As a representative work in this line, Lattice-LSTM \cite{zhang2018chinese} has achieved new state-of-the-art performance on several benchmark Chinese NER datasets. However, Lattice-LSTM suffers from a complicated model architecture, resulting in low computational efficiency. This will heavily limit its application in many industrial areas, which require real-time NER response. In this work, we ask the question: if we can simplify the usage of lexicon and, at the same time, achieve comparative performance with Lattice-LSTM for Chinese NER? Started with this question and motivated by the idea of Lattice-LSTM, we propose a concise but effective method to incorporate the lexicon information into the vector representations of characters. This way, our method can avoid introducing a complicated sequence modeling architecture to model the lexicon information. Instead, it only needs to subtly adjust the character representation layer of the neural sequence model. Experimental study on four benchmark Chinese NER datasets shows that our method can achieve much faster inference speed, comparative or better performance over Lattice-LSTM and its follwees. It also shows that our method can be easily transferred across difference neural architectures.Comment: Use Lexicon for Chinese NER as simply as possibl

    Porous Lattice-based Transformer Encoder for Chinese NER

    Full text link
    Incorporating lattices into character-level Chinese named entity recognition is an effective method to exploit explicit word information. Recent works extend recurrent and convolutional neural networks to model lattice inputs. However, due to the DAG structure or the variable-sized potential word set for lattice inputs, these models prevent the convenient use of batched computation, resulting in serious inefficient. In this paper, we propose a porous lattice-based transformer encoder for Chinese named entity recognition, which is capable to better exploit the GPU parallelism and batch the computation owing to the mask mechanism in transformer. We first investigate the lattice-aware self-attention coupled with relative position representations to explore effective word information in the lattice structure. Besides, to strengthen the local dependencies among neighboring tokens, we propose a novel porous structure during self-attentional computation processing, in which every two non-neighboring tokens are connected through a shared pivot node. Experimental results on four datasets show that our model performs up to 9.47 times faster than state-of-the-art models, while is roughly on a par with its performance. The source code of this paper can be obtained from https://github.com/xxx/xxx.Comment: 9 pages, 4 figure

    FGN: Fusion Glyph Network for Chinese Named Entity Recognition

    Full text link
    Chinese NER is a challenging task. As pictographs, Chinese characters contain latent glyph information, which is often overlooked. In this paper, we propose the FGN, Fusion Glyph Network for Chinese NER. Except for adding glyph information, this method may also add extra interactive information with the fusion mechanism. The major innovations of FGN include: (1) a novel CNN structure called CGS-CNN is proposed to capture both glyph information and interactive information between glyphs from neighboring characters. (2) we provide a method with sliding window and Slice-Attention to fuse the BERT representation and glyph representation for a character, which may capture potential interactive knowledge between context and glyph. Experiments are conducted on four NER datasets, showing that FGN with LSTM-CRF as tagger achieves new state-of-the-arts performance for Chinese NER. Further, more experiments are conducted to investigate the influences of various components and settings in FGN

    Learning Task-specific Representation for Novel Words in Sequence Labeling

    Full text link
    Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.Comment: This work has been accepted by IJCAI 201

    SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER

    Full text link
    Although character-based models using lexicon have achieved promising results for Chinese named entity recognition (NER) task, some lexical words would introduce erroneous information due to wrongly matched words. Existing researches proposed many strategies to integrate lexicon knowledge. However, they performed with simple first-order lexicon knowledge, which provided insufficient word information and still faced the challenge of matched word boundary conflicts; or explored the lexicon knowledge with graph where higher-order information introducing negative words may disturb the identification. To alleviate the above limitations, we present new insight into second-order lexicon knowledge (SLK) of each character in the sentence to provide more lexical word information including semantic and word boundary features. Based on these, we propose a SLK-based model with a novel strategy to integrate the above lexicon knowledge. The proposed model can exploit more discernible lexical words information with the help of global context. Experimental results on three public datasets demonstrate the validity of SLK. The proposed model achieves more excellent performance than the state-of-the-art comparison methods.Comment: 5 pages, The work is accepted by SEKE202
    • …
    corecore