2,588 research outputs found
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
Object detection via a multi-region & semantic segmentation-aware CNN model
We propose an object detection system that relies on a multi-region deep
convolutional neural network (CNN) that also encodes semantic
segmentation-aware features. The resulting CNN-based representation aims at
capturing a diverse set of discriminative appearance factors and exhibits
localization sensitivity that is essential for accurate object localization. We
exploit the above properties of our recognition module by integrating it on an
iterative localization mechanism that alternates between scoring a box proposal
and refining its location with a deep CNN regression model. Thanks to the
efficient use of our modules, we detect objects with very high localization
accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we
achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published
work by a significant margin.Comment: Extended technical report -- short version to appear at ICCV 201
- …