1,856 research outputs found
Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation
Character-based sequence labeling framework is flexible and efficient for
Chinese word segmentation (CWS). Recently, many character-based neural models
have been applied to CWS. While they obtain good performance, they have two
obvious weaknesses. The first is that they heavily rely on manually designed
bigram feature, i.e. they are not good at capturing n-gram features
automatically. The second is that they make no use of full word information.
For the first weakness, we propose a convolutional neural model, which is able
to capture rich n-gram features without any feature engineering. For the second
one, we propose an effective approach to integrate the proposed model with word
embeddings. We evaluate the model on two benchmark datasets: PKU and MSR.
Without any feature engineering, the model obtains competitive performance --
95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves
state-of-the-art performance on both datasets -- 96.5% on PKU and 98.0% on MSR,
without using any external labeled resource.Comment: will be published by IJCNLP201
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
Named entity recognition (NER) in Chinese is essential but difficult because
of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS)
is usually considered as the first step for Chinese NER. However, models based
on word-level embeddings and lexicon features often suffer from segmentation
errors and out-of-vocabulary (OOV) words. In this paper, we investigate a
Convolutional Attention Network called CAN for Chinese NER, which consists of a
character-based convolutional neural network (CNN) with local-attention layer
and a gated recurrent unit (GRU) with global self-attention layer to capture
the information from adjacent characters and sentence contexts. Also, compared
to other models, not depending on any external resources like lexicons and
employing small size of char embeddings make our model more practical.
Extensive experimental results show that our approach outperforms
state-of-the-art methods without word embedding and external lexicon resources
on different domain datasets including Weibo, MSRA and Chinese Resume NER
dataset.Comment: This paper is accepted by NAACL-HLT 2019. The code is available at
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NE
Structure Regularized Bidirectional Recurrent Convolutional Neural Network for Relation Classification
Relation classification is an important semantic processing task in the field
of natural language processing (NLP). In this paper, we present a novel model,
Structure Regularized Bidirectional Recurrent Convolutional Neural
Network(SR-BRCNN), to classify the relation of two entities in a sentence, and
the new dataset of Chinese Sanwen for named entity recognition and relation
classification. Some state-of-the-art systems concentrate on modeling the
shortest dependency path (SDP) between two entities leveraging convolutional or
recurrent neural networks. We further explore how to make full use of the
dependency relations information in the SDP and how to improve the model by the
method of structure regularization. We propose a structure regularized model to
learn relation representations along the SDP extracted from the forest formed
by the structure regularized dependency tree, which benefits reducing the
complexity of the whole model and helps improve the score by 10.3.
Experimental results show that our method outperforms the state-of-the-art
approaches on the Chinese Sanwen task and performs as well on the SemEval-2010
Task 8 dataset\footnote{The Chinese Sanwen corpus this paper developed and used
will be released in the further.Comment: arXiv admin note: text overlap with arXiv:1411.6243 by other author
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
We investigate a lattice LSTM network for Chinese word segmentation (CWS) to
utilize words or subwords. It integrates the character sequence features with
all subsequences information matched from a lexicon. The matched subsequences
serve as information shortcut tunnels which link their start and end characters
directly. Gated units are used to control the contribution of multiple input
links. Through formula derivation and comparison, we show that the lattice LSTM
is an extension of the standard LSTM with the ability to take multiple inputs.
Previous lattice LSTM model takes word embeddings as the lexicon input, we
prove that subword encoding can give the comparable performance and has the
benefit of not relying on any external segmentor. The contribution of lattice
LSTM comes from both lexicon and pretrained embeddings information, we find
that the lexicon information contributes more than the pretrained embeddings
information through controlled experiments. Our experiments show that the
lattice structure with subword encoding gives competitive or better results
with previous state-of-the-art methods on four segmentation benchmarks.
Detailed analyses are conducted to compare the performance of word encoding and
subword encoding in lattice LSTM. We also investigate the performance of
lattice LSTM structure under different circumstances and when this model works
or fails.Comment: 8 page
Combining Discrete and Neural Features for Sequence Labeling
Neural network models have recently received heated research attention in the
natural language processing community. Compared with traditional models with
discrete features, neural models have two main advantages. First, they take
low-dimensional, real-valued embedding vectors as inputs, which can be trained
over large raw data, thereby addressing the issue of feature sparsity in
discrete models. Second, deep neural networks can be used to automatically
combine input features, and including non-local features that capture semantic
patterns that cannot be expressed using discrete indicator features. As a
result, neural network models have achieved competitive accuracies compared
with the best discrete models for a range of NLP tasks.
On the other hand, manual feature templates have been carefully investigated
for most NLP tasks over decades and typically cover the most useful indicator
pattern for solving the problems. Such information can be complementary the
features automatically induced from neural networks, and therefore combining
discrete and neural features can potentially lead to better accuracy compared
with models that leverage discrete or neural features only.
In this paper, we systematically investigate the effect of discrete and
neural feature combination for a range of fundamental NLP tasks based on
sequence labeling, including word segmentation, POS tagging and named entity
recognition for Chinese and English, respectively. Our results on standard
benchmarks show that state-of-the-art neural models can give accuracies
comparable to the best discrete models in the literature for most tasks and
combing discrete and neural features unanimously yield better results.Comment: Accepted by International Conference on Computational Linguistics and
Intelligent Text Processing (CICLing) 2016, Apri
Effective Representation for Easy-First Dependency Parsing
Easy-first parsing relies on subtree re-ranking to build the complete parse
tree. Whereas the intermediate state of parsing processing is represented by
various subtrees, whose internal structural information is the key lead for
later parsing action decisions, we explore a better representation for such
subtrees. In detail, this work introduces a bottom-up subtree encoding method
based on the child-sum tree-LSTM. Starting from an easy-first dependency parser
without other handcraft features, we show that the effective subtree encoder
does promote the parsing process, and can make a greedy search easy-first
parser achieve promising results on benchmark treebanks compared to
state-of-the-art baselines. Furthermore, with the help of the current
pre-training language model, we further improve the state-of-the-art results of
the easy-first approach
Attention Is All You Need for Chinese Word Segmentation
Taking greedy decoding algorithm as it should be, this work focuses on
further strengthening the model itself for Chinese word segmentation (CWS),
which results in an even more fast and more accurate CWS model. Our model
consists of an attention only stacked encoder and a light enough decoder for
the greedy segmentation plus two highway connections for smoother training, in
which the encoder is composed of a newly proposed Transformer variant,
Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer.
With the effective encoder design, our model only needs to take unigram
features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark
datasets. The experimental results show that with the highest segmentation
speed, the proposed model achieves new state-of-the-art or comparable
performance against strong baselines in terms of strict closed test setting.Comment: 11 pages, to appear in EMNLP 2020 as a long pape
Glyph-aware Embedding of Chinese Characters
Given the advantage and recent success of English character-level and
subword-unit models in several NLP tasks, we consider the equivalent modeling
problem for Chinese. Chinese script is logographic and many Chinese logograms
are composed of common substructures that provide semantic, phonetic and
syntactic hints. In this work, we propose to explicitly incorporate the visual
appearance of a character's glyph in its representation, resulting in a novel
glyph-aware embedding of Chinese characters. Being inspired by the success of
convolutional neural networks in computer vision, we use them to incorporate
the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In
the context of two basic Chinese NLP tasks of language modeling and word
segmentation, the model learns to represent each character's task-relevant
semantic and syntactic information in the character-level embedding.Comment: Workshop on Subword and Character level models in NLP at EMNLP 2017.
Source code availabl
Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning
The Chinese pronunciation system offers two characteristics that distinguish
it from other languages: deep phonemic orthography and intonation variations.
We are the first to argue that these two important properties can play a major
role in Chinese sentiment analysis. Particularly, we propose two effective
features to encode phonetic information. Next, we develop a Disambiguate
Intonation for Sentiment Analysis (DISA) network using a reinforcement network.
It functions as disambiguating intonations for each Chinese character (pinyin).
Thus, a precise phonetic representation of Chinese is learned. Furthermore, we
also fuse phonetic features with textual and visual features in order to mimic
the way humans read and understand Chinese text. Experimental results on five
different Chinese sentiment analysis datasets show that the inclusion of
phonetic features significantly and consistently improves the performance of
textual and visual representations and outshines the state-of-the-art Chinese
character level representations
Learning Chinese Word Representations From Glyphs Of Characters
In this paper, we propose new methods to learn Chinese word representations.
Chinese characters are composed of graphical components, which carry rich
semantics. It is common for a Chinese learner to comprehend the meaning of a
word from these graphical components. As a result, we propose models that
enhance word representations by character glyphs. The character glyph features
are directly learned from the bitmaps of characters by convolutional
auto-encoder(convAE), and the glyph features improve Chinese word
representations which are already enhanced by character embeddings. Another
contribution in this paper is that we created several evaluation datasets in
traditional Chinese and made them public
- …