21,566 research outputs found
Hybrid multi-layer Deep CNN/Aggregator feature for image classification
Deep Convolutional Neural Networks (DCNN) have established a remarkable
performance benchmark in the field of image classification, displacing
classical approaches based on hand-tailored aggregations of local descriptors.
Yet DCNNs impose high computational burdens both at training and at testing
time, and training them requires collecting and annotating large amounts of
training data. Supervised adaptation methods have been proposed in the
literature that partially re-learn a transferred DCNN structure from a new
target dataset. Yet these require expensive bounding-box annotations and are
still computationally expensive to learn. In this paper, we address these
shortcomings of DCNN adaptation schemes by proposing a hybrid approach that
combines conventional, unsupervised aggregators such as Bag-of-Words (BoW),
with the DCNN pipeline by treating the output of intermediate layers as densely
extracted local descriptors.
We test a variant of our approach that uses only intermediate DCNN layers on
the standard PASCAL VOC 2007 dataset and show performance significantly higher
than the standard BoW model and comparable to Fisher vector aggregation but
with a feature that is 150 times smaller. A second variant of our approach that
includes the fully connected DCNN layers significantly outperforms Fisher
vector schemes and performs comparably to DCNN approaches adapted to Pascal VOC
2007, yet at only a small fraction of the training and testing cost.Comment: Accepted in ICASSP 2015 conference, 5 pages including reference, 4
figures and 2 table
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …