318 research outputs found
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Online handwritten Chinese text recognition (OHCTR) is a challenging problem
as it involves a large-scale character set, ambiguous segmentation, and
variable-length input sequences. In this paper, we exploit the outstanding
capability of path signature to translate online pen-tip trajectories into
informative signature feature maps using a sliding window-based method,
successfully capturing the analytic and geometric properties of pen strokes
with strong local invariance and robustness. A multi-spatial-context fully
convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
spatial contexts from the signature feature maps and generate a prediction
sequence while completely avoiding the difficult segmentation problem.
Furthermore, an implicit language model is developed to make predictions based
on semantic context within a predicting feature sequence, providing a new
perspective for incorporating lexicon constraints and prior knowledge about a
certain language in the recognition procedure. Experiments on two standard
benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
correct rates of 97.10% and 97.15%, respectively, which are significantly
better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Connectionist temporal classification (CTC) is commonly adopted for sequence
modeling tasks like speech recognition, where it is necessary to preserve order
between the input and target sequences. However, CTC is only applied to
deterministic sequence models, where the latent space is discontinuous and
sparse, which in turn makes them less capable of handling data variability when
compared to variational models. In this paper, we integrate CTC with a
variational model and derive loss functions that can be used to train more
generalizable sequence models that preserve order. Specifically, we derive two
versions of the novel variational CTC based on two reasonable assumptions, the
first being that the variational latent variables at each time step are
conditionally independent; and the second being that these latent variables are
Markovian. We show that both loss functions allow direct optimization of the
variational lower bound for the model log-likelihood, and present
computationally tractable forms for implementing them.Comment: 5 pages, 3 figures, conferenc
Unsupervised Neural Hidden Markov Models
In this work, we present the first results for neuralizing an Unsupervised
Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach
outperforms existing generative models and is competitive with the
state-of-the-art though with a simpler model easily extended to include
additional context.Comment: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP.
Oral presentatio
Learning to Read by Spelling: Towards Unsupervised Text Recognition
This work presents a method for visual text recognition without using any
paired supervisory data. We formulate the text recognition task as one of
aligning the conditional distribution of strings predicted from given text
images, with lexically valid strings sampled from target corpora. This enables
fully automated, and unsupervised learning from just line-level text-images,
and unpaired text-string samples, obviating the need for large aligned
datasets. We present detailed analysis for various aspects of the proposed
method, namely - (1) impact of the length of training sequences on convergence,
(2) relation between character frequencies and the order in which they are
learnt, (3) generalisation ability of our recognition network to inputs of
arbitrary lengths, and (4) impact of varying the text corpus on recognition
accuracy. Finally, we demonstrate excellent text recognition accuracy on both
synthetically generated text images, and scanned images of real printed books,
using no labelled training examples
- …