1,418 research outputs found
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
Learning with Weak Annotations for Text in the Wild Detection and Recognition
V tĂ©to práci pĹ™edstavujeme metodu vyuĹľĂvajĂcĂ slabÄ› anotovanĂ© obrázky pro vylepšenĂ systĂ©mĹŻ pro extrakci textu. Slabá antoace spoÄŤĂvá v seznamu textĹŻ, kterĂ© se v danĂ©m obrázku mohou vyskytovat, ale nevĂme kde. Metoda pouĹľĂvá libovolnĂ˝ existujĂcĂ systĂ©m pro rozpoznávánĂ textu k zĂskánĂ oblastĂ, kde se pravdÄ›podobnÄ› vyskytuje text, spolu s ne nutnÄ› správnĂ˝m pĹ™episem. VĂ˝sledkem procesu zahrnujĂcĂho párovánĂ nepĹ™esnĂ˝ch pĹ™episĹŻ se slabĂ˝mi anotacemi a prohledávánĂ okolĂ vedenĂ© Levenshtein vzdálenostĂ jsou skoro bezchybnÄ› lokalizovanĂ© texty, se kterĂ˝mi dále zacházĂme jako s pseudo-anotacemi vyuĹľĂvanĂ˝mi k uÄŤenĂ. AplikovánĂ metody na dva slabÄ› anotovanĂ© datasety a douÄŤenĂ pouĹľitĂ©ho systĂ©mu pomocĂ zĂskanĂ˝ch pseudo-anotacĂ ukazuje, Ĺľe námi navrĹľenĂ˝ proces konzistentnÄ› zlepšuje pĹ™esnost rozpoznávánĂ na rĹŻznĂ˝ch datasetech (jinĂ˝ch domĂ©nách) běžnÄ› vyuĹľĂvanĂ˝ch k testovánĂ a velmi vĂ˝raznÄ› zvyšuje pĹ™esnost na stejnĂ©m datasetu. Metodu lze pouĹľĂt iterativnÄ›.In this work, we present a method for exploiting weakly annotated images to improve text extraction pipelines. The weak annotation of an image is a list of texts that are likely to appear in the image without any information about the location. An arbitrary existing end-to-end text recognition system is used to obtain text region proposals and their, possibly erroneous, transcriptions. A process that includes imprecise transcription to annotation matching and edit distance guided neighbourhood search produces nearly error-free, localised instances of scene text, which we treat as ``pseudo ground truth'' used for training. We apply the method to two weakly-annotated datasets and use the obtained pseudo ground truth to re-train the end-to-end system. The process consistently improves the accuracy of a state of the art recognition model across different benchmark datasets (image domains) as well as providing a significant performance boost on the same dataset, further improving when applied iteratively
- …