16,138 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Cultural Event Recognition with Visual ConvNets and Temporal Models
This paper presents our contribution to the ChaLearn Challenge 2015 on
Cultural Event Classification. The challenge in this task is to automatically
classify images from 50 different cultural events. Our solution is based on the
combination of visual features extracted from convolutional neural networks
with temporal information using a hierarchical classifier scheme. We extract
visual features from the last three fully connected layers of both CaffeNet
(pretrained with ImageNet) and our fine tuned version for the ChaLearn
challenge. We propose a late fusion strategy that trains a separate low-level
SVM on each of the extracted neural codes. The class predictions of the
low-level SVMs form the input to a higher level SVM, which gives the final
event scores. We achieve our best result by adding a temporal refinement step
into our classification scheme, which is applied directly to the output of each
low-level SVM. Our approach penalizes high classification scores based on
visual features when their time stamp does not match well an event-specific
temporal distribution learned from the training and validation data. Our system
achieved the second best result in the ChaLearn Challenge 2015 on Cultural
Event Classification with a mean average precision of 0.767 on the test set.Comment: Initial version of the paper accepted at the CVPR Workshop ChaLearn
Looking at People 201
Detection-by-Localization: Maintenance-Free Change Object Detector
Recent researches demonstrate that self-localization performance is a very
useful measure of likelihood-of-change (LoC) for change detection. In this
paper, this "detection-by-localization" scheme is studied in a novel
generalized task of object-level change detection. In our framework, a given
query image is segmented into object-level subimages (termed "scene parts"),
which are then converted to subimage-level pixel-wise LoC maps via the
detection-by-localization scheme. Our approach models a self-localization
system as a ranking function, outputting a ranked list of reference images,
without requiring relevance score. Thanks to this new setting, we can
generalize our approach to a broad class of self-localization systems. Our
ranking based self-localization model allows to fuse self-localization results
from different modalities via an unsupervised rank fusion derived from a field
of multi-modal information retrieval (MMR).Comment: 7 pages, 3 figures, Technical repor
- …