4,334 research outputs found
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
Straight to Shapes: Real-time Detection of Encoded Shapes
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects' shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories similar to the
ones in the training set, something that most existing approaches fail to
handle.Comment: 16 pages including appendix; Published at CVPR 201
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
- …