7,331 research outputs found
Text Localization in Video Using Multiscale Weber's Local Descriptor
In this paper, we propose a novel approach for detecting the text present in
videos and scene images based on the Multiscale Weber's Local Descriptor
(MWLD). Given an input video, the shots are identified and the key frames are
extracted based on their spatio-temporal relationship. From each key frame, we
detect the local region information using WLD with different radius and
neighborhood relationship of pixel values and hence obtained intensity enhanced
key frames at multiple scales. These multiscale WLD key frames are merged
together and then the horizontal gradients are computed using morphological
operations. The obtained results are then binarized and the false positives are
eliminated based on geometrical properties. Finally, we employ connected
component analysis and morphological dilation operation to determine the text
regions that aids in text localization. The experimental results obtained on
publicly available standard Hua, Horizontal-1 and Horizontal-2 video dataset
illustrate that the proposed method can accurately detect and localize texts of
various sizes, fonts and colors in videos.Comment: IEEE SPICES, 201
WordFences: Text localization and recognition
En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, text recognition has achieved remarkable success in recognizing scanned
document text. However, word recognition in natural images is still an open problem,
which generally requires time consuming post-processing steps. We present a novel architecture
for individual word detection in scene images based on semantic segmentation.
Our contributions are twofold: the concept of WordFence, which detects border areas
surrounding each individual word and a unique pixelwise weighted softmax loss function
which penalizes background and emphasizes small text regions. WordFence ensures that
each word is detected individually, and the new loss function provides a strong training
signal to both text and word border localization. The proposed technique avoids intensive
post-processing by combining semantic word segmentation with a voting scheme
for merging segmentations of multiple scales, producing an end-to-end word detection
system. We achieve superior localization recall on common benchmark datasets - 92%
recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, end-to-end
word recognition achieves state-of-the-art 86% F-Score on ICDAR13
- …