25,743 research outputs found
Object detection via a multi-region & semantic segmentation-aware CNN model
We propose an object detection system that relies on a multi-region deep
convolutional neural network (CNN) that also encodes semantic
segmentation-aware features. The resulting CNN-based representation aims at
capturing a diverse set of discriminative appearance factors and exhibits
localization sensitivity that is essential for accurate object localization. We
exploit the above properties of our recognition module by integrating it on an
iterative localization mechanism that alternates between scoring a box proposal
and refining its location with a deep CNN regression model. Thanks to the
efficient use of our modules, we detect objects with very high localization
accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we
achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published
work by a significant margin.Comment: Extended technical report -- short version to appear at ICCV 201
WordFences: Text localization and recognition
En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, text recognition has achieved remarkable success in recognizing scanned
document text. However, word recognition in natural images is still an open problem,
which generally requires time consuming post-processing steps. We present a novel architecture
for individual word detection in scene images based on semantic segmentation.
Our contributions are twofold: the concept of WordFence, which detects border areas
surrounding each individual word and a unique pixelwise weighted softmax loss function
which penalizes background and emphasizes small text regions. WordFence ensures that
each word is detected individually, and the new loss function provides a strong training
signal to both text and word border localization. The proposed technique avoids intensive
post-processing by combining semantic word segmentation with a voting scheme
for merging segmentations of multiple scales, producing an end-to-end word detection
system. We achieve superior localization recall on common benchmark datasets - 92%
recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, end-to-end
word recognition achieves state-of-the-art 86% F-Score on ICDAR13
BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks
We present a simple and effective framework for simultaneous semantic
segmentation and instance segmentation with Fully Convolutional Networks
(FCNs). The method, called BiSeg, predicts instance segmentation as a posterior
in Bayesian inference, where semantic segmentation is used as a prior. We
extend the idea of position-sensitive score maps used in recent methods to a
fusion of multiple score maps at different scales and partition modes, and
adopt it as a robust likelihood for instance segmentation inference. As both
Bayesian inference and map fusion are performed per pixel, BiSeg is a fully
convolutional end-to-end solution that inherits all the advantages of FCNs. We
demonstrate state-of-the-art instance segmentation accuracy on PASCAL VOC.Comment: BMVC201
- …