Search CORE

1,668 research outputs found

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

Author: Fornés Alicia
Kang Lei
Riba Pau
Rusiñol Marçal
Villegas Mauricio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/05/2020
Field of study

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.Comment: Accepted to WACV 202

arXiv.org e-Print Archive

Crossref

Robust Outdoor Vehicle Visual Tracking Based on k-Sparse Stacked Denoising Auto-Encoder

Author: Du Xing
Liu Ding
Shi Yaqian
Xin Jing
Zhang Jian
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

Robust visual tracking for outdoor vehicle is still a challenging problem due to large object appearance variations caused by illumination variation, occlusion, and fast motion. In this chapter, k-sparse constraint is added to the encoder part of stacked auto-encoder network to learn more invariant feature of object appearance, and a stacked k-sparse-auto-encoder–based robust outdoor vehicle tracking method under particle filter inference is further proposed to solve the problem of appearance variance during the tracking. Firstly, a stacked denoising auto-encoder is pre-trained to learn the generic feature representation. Then, a k-sparse constraint is added to the stacked denoising auto-encoder, and the encoder of k-sparse stacked denoising auto-encoder is connected with a classification layer to construct a classification neural network. Finally, confidence of each particle is computed by the classification neural network and is used for online tracking under particle filter framework. Comprehensive tracking experiments are conducted on a challenging single-object tracking benchmark. Experimental results show that our tracker outperforms most state-of-the-art trackers

IntechOpen

Crossref

Improved training of end-to-end attention models for speech recognition

Author: Irie Kazuki
Ney Hermann
Schlüter Ralf
Zeyer Albert
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeech. We introduce a new pretraining scheme by starting with a high time reduction factor and lowering it during training, which is crucial both for convergence and final performance. In some experiments, we also use an auxiliary CTC loss function to help the convergence. In addition, we train long short-term memory (LSTM) language models on subword units. By shallow fusion, we report up to 27% relative improvements in WER over the attention baseline without a language model.Comment: submitted to Interspeech 201

arXiv.org e-Print Archive

Publikationsserver der RWTH Aachen University