27 research outputs found
Learning with Weak Annotations for Text in the Wild Detection and Recognition
V tĂ©to práci pĹ™edstavujeme metodu vyuĹľĂvajĂcĂ slabÄ› anotovanĂ© obrázky pro vylepšenĂ systĂ©mĹŻ pro extrakci textu. Slabá antoace spoÄŤĂvá v seznamu textĹŻ, kterĂ© se v danĂ©m obrázku mohou vyskytovat, ale nevĂme kde. Metoda pouĹľĂvá libovolnĂ˝ existujĂcĂ systĂ©m pro rozpoznávánĂ textu k zĂskánĂ oblastĂ, kde se pravdÄ›podobnÄ› vyskytuje text, spolu s ne nutnÄ› správnĂ˝m pĹ™episem. VĂ˝sledkem procesu zahrnujĂcĂho párovánĂ nepĹ™esnĂ˝ch pĹ™episĹŻ se slabĂ˝mi anotacemi a prohledávánĂ okolĂ vedenĂ© Levenshtein vzdálenostĂ jsou skoro bezchybnÄ› lokalizovanĂ© texty, se kterĂ˝mi dále zacházĂme jako s pseudo-anotacemi vyuĹľĂvanĂ˝mi k uÄŤenĂ. AplikovánĂ metody na dva slabÄ› anotovanĂ© datasety a douÄŤenĂ pouĹľitĂ©ho systĂ©mu pomocĂ zĂskanĂ˝ch pseudo-anotacĂ ukazuje, Ĺľe námi navrĹľenĂ˝ proces konzistentnÄ› zlepšuje pĹ™esnost rozpoznávánĂ na rĹŻznĂ˝ch datasetech (jinĂ˝ch domĂ©nách) běžnÄ› vyuĹľĂvanĂ˝ch k testovánĂ a velmi vĂ˝raznÄ› zvyšuje pĹ™esnost na stejnĂ©m datasetu. Metodu lze pouĹľĂt iterativnÄ›.In this work, we present a method for exploiting weakly annotated images to improve text extraction pipelines. The weak annotation of an image is a list of texts that are likely to appear in the image without any information about the location. An arbitrary existing end-to-end text recognition system is used to obtain text region proposals and their, possibly erroneous, transcriptions. A process that includes imprecise transcription to annotation matching and edit distance guided neighbourhood search produces nearly error-free, localised instances of scene text, which we treat as ``pseudo ground truth'' used for training. We apply the method to two weakly-annotated datasets and use the obtained pseudo ground truth to re-train the end-to-end system. The process consistently improves the accuracy of a state of the art recognition model across different benchmark datasets (image domains) as well as providing a significant performance boost on the same dataset, further improving when applied iteratively
SPTS: Single-Point Text Spotting
Existing scene text spotting (i.e., end-to-end text detection and
recognition) methods rely on costly bounding box annotations (e.g., text-line,
word-level, or character-level bounding boxes). For the first time, we
demonstrate that training scene text spotting models can be achieved with an
extremely low-cost annotation of a single-point for each instance. We propose
an end-to-end scene text spotting method that tackles scene text spotting as a
sequence prediction task. Given an image as input, we formulate the desired
detection and recognition results as a sequence of discrete tokens and use an
auto-regressive Transformer to predict the sequence. The proposed method is
simple yet effective, which can achieve state-of-the-art results on widely used
benchmarks. Most significantly, we show that the performance is not very
sensitive to the positions of the point annotation, meaning that it can be much
easier to be annotated or even be automatically generated than the bounding box
that requires precise positions. We believe that such a pioneer attempt
indicates a significant opportunity for scene text spotting applications of a
much larger scale than previously possible. The code will be publicly
available