48,245 research outputs found
Attention-based Neural Text Segmentation
Text segmentation plays an important role in various Natural Language
Processing (NLP) tasks like summarization, context understanding, document
indexing and document noise removal. Previous methods for this task require
manual feature engineering, huge memory requirements and large execution times.
To the best of our knowledge, this paper is the first one to present a novel
supervised neural approach for text segmentation. Specifically, we propose an
attention-based bidirectional LSTM model where sentence embeddings are learned
using CNNs and the segments are predicted based on contextual information. This
model can automatically handle variable sized context information. Compared to
the existing competitive baselines, the proposed model shows a performance
improvement of ~7% in WinDiff score on three benchmark datasets
An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers
One important and particularly challenging step in the optical character
recognition (OCR) of historical documents with complex layouts, such as
newspapers, is the separation of text from non-text content (e.g. page borders
or illustrations). This step is commonly referred to as page segmentation.
While various rule-based algorithms have been proposed, the applicability of
Deep Neural Networks (DNNs) for this task recently has gained a lot of
attention. In this paper, we perform a systematic evaluation of 11 different
published DNN backbone architectures and 9 different tiling and scaling
configurations for separating text, tables or table column lines. We also show
the influence of the number of labels and the number of training pages on the
segmentation quality, which we measure using the Matthews Correlation
Coefficient. Our results show that (depending on the task) Inception-ResNet-v2
and EfficientNet backbones work best, vertical tiling is generally preferable
to other tiling approaches, and training data that comprises 30 to 40 pages
will be sufficient most of the time.Comment: Evaluation of deep neural networks for the segmentation of pages of
historical newspapers; 21 pages total (incl. references and appendix), 7
figures, 5 table
Attention-based Natural Language Person Retrieval
Following the recent progress in image classification and captioning using
deep learning, we develop a novel natural language person retrieval system
based on an attention mechanism. More specifically, given the description of a
person, the goal is to localize the person in an image. To this end, we first
construct a benchmark dataset for natural language person retrieval. To do so,
we generate bounding boxes for persons in a public image dataset from the
segmentation masks, which are then annotated with descriptions and attributes
using the Amazon Mechanical Turk. We then adopt a region proposal network in
Faster R-CNN as a candidate region generator. The cropped images based on the
region proposals as well as the whole images with attention weights are fed
into Convolutional Neural Networks for visual feature extraction, while the
natural language expression and attributes are input to Bidirectional Long
Short- Term Memory (BLSTM) models for text feature extraction. The visual and
text features are integrated to score region proposals, and the one with the
highest score is retrieved as the output of our system. The experimental
results show significant improvement over the state-of-the-art method for
generic object retrieval and this line of research promises to benefit search
in surveillance video footage.Comment: CVPR 2017 Workshop (vision meets cognition
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Offline handwriting recognition systems require cropped text line images for
both training and recognition. On the one hand, the annotation of position and
transcript at line level is costly to obtain. On the other hand, automatic line
segmentation algorithms are prone to errors, compromising the subsequent
recognition. In this paper, we propose a modification of the popular and
efficient multi-dimensional long short-term memory recurrent neural networks
(MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More
particularly, we replace the collapse layer transforming the two-dimensional
representation into a sequence of predictions by a recurrent version which can
recognize one line at a time. In the proposed model, a neural network performs
a kind of implicit line segmentation by computing attention weights on the
image representation. The experiments on paragraphs of Rimes and IAM database
yield results that are competitive with those of networks trained at line
level, and constitute a significant step towards end-to-end transcription of
full documents
Combining Discrete and Neural Features for Sequence Labeling
Neural network models have recently received heated research attention in the
natural language processing community. Compared with traditional models with
discrete features, neural models have two main advantages. First, they take
low-dimensional, real-valued embedding vectors as inputs, which can be trained
over large raw data, thereby addressing the issue of feature sparsity in
discrete models. Second, deep neural networks can be used to automatically
combine input features, and including non-local features that capture semantic
patterns that cannot be expressed using discrete indicator features. As a
result, neural network models have achieved competitive accuracies compared
with the best discrete models for a range of NLP tasks.
On the other hand, manual feature templates have been carefully investigated
for most NLP tasks over decades and typically cover the most useful indicator
pattern for solving the problems. Such information can be complementary the
features automatically induced from neural networks, and therefore combining
discrete and neural features can potentially lead to better accuracy compared
with models that leverage discrete or neural features only.
In this paper, we systematically investigate the effect of discrete and
neural feature combination for a range of fundamental NLP tasks based on
sequence labeling, including word segmentation, POS tagging and named entity
recognition for Chinese and English, respectively. Our results on standard
benchmarks show that state-of-the-art neural models can give accuracies
comparable to the best discrete models in the literature for most tasks and
combing discrete and neural features unanimously yield better results.Comment: Accepted by International Conference on Computational Linguistics and
Intelligent Text Processing (CICLing) 2016, Apri
Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence
The neural attention model has achieved great success in data-to-text
generation tasks. Though usually excelling at producing fluent text, it suffers
from the problem of information missing, repetition and "hallucination". Due to
the black-box nature of the neural attention architecture, avoiding these
problems in a systematic way is non-trivial. To address this concern, we
propose to explicitly segment target text into fragment units and align them
with their data correspondences. The segmentation and correspondence are
jointly learned as latent variables without any human annotations. We further
impose a soft statistical constraint to regularize the segmental granularity.
The resulting architecture maintains the same expressive power as neural
attention models, while being able to generate fully interpretable outputs with
several times less computational cost. On both E2E and WebNLG benchmarks, we
show the proposed model consistently outperforms its neural attention
counterparts.Comment: Accepted at ACL 202
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention
We present an attention-based model for end-to-end handwriting recognition.
Our system does not require any segmentation of the input paragraph. The model
is inspired by the differentiable attention models presented recently for
speech recognition, image captioning or translation. The main difference is the
covert and overt attention, implemented as a multi-dimensional LSTM network.
Our principal contribution towards handwriting recognition lies in the
automatic transcription without a prior segmentation into lines, which was
crucial in previous approaches. To the best of our knowledge this is the first
successful attempt of end-to-end multi-line handwriting recognition. We carried
out experiments on the well-known IAM Database. The results are encouraging and
bring hope to perform full paragraph transcription in the near future
Is Word Segmentation Necessary for Deep Learning of Chinese Representations?
Segmenting a chunk of text into words is usually the first step of processing
Chinese text, but its necessity has rarely been explored. In this paper, we ask
the fundamental question of whether Chinese word segmentation (CWS) is
necessary for deep learning-based Chinese Natural Language Processing. We
benchmark neural word-based models which rely on word segmentation against
neural char-based models which do not involve word segmentation in four
end-to-end NLP benchmark tasks: language modeling, machine translation,
sentence matching/paraphrase and text classification. Through direct
comparisons between these two types of models, we find that char-based models
consistently outperform word-based models. Based on these observations, we
conduct comprehensive experiments to study why word-based models underperform
char-based models in these deep learning-based NLP tasks. We show that it is
because word-based models are more vulnerable to data sparsity and the presence
of out-of-vocabulary (OOV) words, and thus more prone to overfitting. We hope
this paper could encourage researchers in the community to rethink the
necessity of word segmentation in deep learning-based Chinese Natural Language
Processing. \footnote{Yuxian Meng and Xiaoya Li contributed equally to this
paper.}Comment: to appear at ACL201
Scene Text Recognition via Transformer
Scene text recognition with arbitrary shape is very challenging due to large
variations in text shapes, fonts, colors, backgrounds, etc. Most
state-of-the-art algorithms rectify the input image into the normalized image,
then treat the recognition as a sequence prediction task. The bottleneck of
such methods is the rectification, which will cause errors due to distortion
perspective. In this paper, we find that the rectification is completely
unnecessary. What all we need is the spatial attention. We therefore propose a
simple but extremely effective scene text recognition method based on
transformer [50]. Different from previous transformer based models [56,34],
which just use the decoder of the transformer to decode the convolutional
attention, the proposed method use a convolutional feature maps as word
embedding input into transformer. In such a way, our method is able to make
full use of the powerful attention mechanism of the transformer. Extensive
experimental results show that the proposed method significantly outperforms
state-of-the-art methods by a very large margin on both regular and irregular
text datasets. On one of the most challenging CUTE dataset whose
state-of-the-art prediction accuracy is 89.6%, our method achieves 99.3%, which
is a pretty surprising result. We will release our source code and believe that
our method will be a new benchmark of scene text recognition with arbitrary
shapes.Comment: We found that there are some errors in the experiment code, and we
are correcting the result temporarily, so we temporarily withdraw this pape
Toward Fast and Accurate Neural Discourse Segmentation
Discourse segmentation, which segments texts into Elementary Discourse Units,
is a fundamental step in discourse analysis. Previous discourse segmenters rely
on complicated hand-crafted features and are not practical in actual use. In
this paper, we propose an end-to-end neural segmenter based on BiLSTM-CRF
framework. To improve its accuracy, we address the problem of data
insufficiency by transferring a word representation model that is trained on a
large corpus. We also propose a restricted self-attention mechanism in order to
capture useful information within a neighborhood. Experiments on the RST-DT
corpus show that our model is significantly faster than previous methods, while
achieving new state-of-the-art performance.Comment: 6 pages, camera-ready version of EMNLP 201
- …