3,214 research outputs found
Learning Dilation Factors for Semantic Segmentation of Street Scenes
Contextual information is crucial for semantic segmentation. However, finding
the optimal trade-off between keeping desired fine details and at the same time
providing sufficiently large receptive fields is non trivial. This is even more
so, when objects or classes present in an image significantly vary in size.
Dilated convolutions have proven valuable for semantic segmentation, because
they allow to increase the size of the receptive field without sacrificing
image resolution. However, in current state-of-the-art methods, dilation
parameters are hand-tuned and fixed. In this paper, we present an approach for
learning dilation parameters adaptively per channel, consistently improving
semantic segmentation results on street-scene datasets like Cityscapes and
Camvid.Comment: GCPR201
WordFences: Text localization and recognition
En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, text recognition has achieved remarkable success in recognizing scanned
document text. However, word recognition in natural images is still an open problem,
which generally requires time consuming post-processing steps. We present a novel architecture
for individual word detection in scene images based on semantic segmentation.
Our contributions are twofold: the concept of WordFence, which detects border areas
surrounding each individual word and a unique pixelwise weighted softmax loss function
which penalizes background and emphasizes small text regions. WordFence ensures that
each word is detected individually, and the new loss function provides a strong training
signal to both text and word border localization. The proposed technique avoids intensive
post-processing by combining semantic word segmentation with a voting scheme
for merging segmentations of multiple scales, producing an end-to-end word detection
system. We achieve superior localization recall on common benchmark datasets - 92%
recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, end-to-end
word recognition achieves state-of-the-art 86% F-Score on ICDAR13
- …