Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica

Franz, Matthias O.; Schall, Martin; Schambach, Marc-Peter

Increasing robustness of handwriting recognition using character N-Gram decoding on large lexica

Authors: Matthias O. Franz
Martin Schall
Marc-Peter Schambach
Publication date: 1 January 2016
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Hochschulschriftenserver der HTWG Konstanz

oai:elib.uni-konstanz.de-htwg:...

Last time updated on 26/11/2018