77 research outputs found
When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data
Human action recognition from skeletal data is a hot research topic and
important in many open domain applications of computer vision, thanks to
recently introduced 3D sensors. In the literature, naive methods simply
transfer off-the-shelf techniques from video to the skeletal representation.
However, the current state-of-the-art is contended between to different
paradigms: kernel-based methods and feature learning with (recurrent) neural
networks. Both approaches show strong performances, yet they exhibit heavy, but
complementary, drawbacks. Motivated by this fact, our work aims at combining
together the best of the two paradigms, by proposing an approach where a
shallow network is fed with a covariance representation. Our intuition is that,
as long as the dynamics is effectively modeled, there is no need for the
classification network to be deep nor recurrent in order to score favorably. We
validate this hypothesis in a broad experimental analysis over 6 publicly
available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Curriculum Dropout
Dropout is a very effective way of regularizing neural networks.
Stochastically "dropping out" units with a certain probability discourages
over-specific co-adaptations of feature detectors, preventing overfitting and
improving network generalization. Besides, Dropout can be interpreted as an
approximate model aggregation technique, where an exponential number of smaller
networks are averaged in order to get a more powerful ensemble. In this paper,
we show that using a fixed dropout probability during training is a suboptimal
choice. We thus propose a time scheduling for the probability of retaining
neurons in the network. This induces an adaptive regularization scheme that
smoothly increases the difficulty of the optimization problem. This idea of
"starting easy" and adaptively increasing the difficulty of the learning
problem has its roots in curriculum learning and allows one to train better
models. Indeed, we prove that our optimization strategy implements a very
general curriculum scheme, by gradually adding noise to both the input and
intermediate feature representations within the network architecture.
Experiments on seven image classification datasets and different network
architectures show that our method, named Curriculum Dropout, frequently yields
to better generalization and, at worst, performs just as well as the standard
Dropout method.Comment: Accepted at ICCV (International Conference on Computer Vision) 201
Unsupervised Domain-Adaptive Person Re-identification Based on Attributes
Pedestrian attributes, e.g., hair length, clothes type and color, locally
describe the semantic appearance of a person. Training person re-identification
(ReID) algorithms under the supervision of such attributes have proven to be
effective in extracting local features which are important for ReID. Unlike
person identity, attributes are consistent across different domains (or
datasets). However, most of ReID datasets lack attribute annotations. On the
other hand, there are several datasets labeled with sufficient attributes for
the case of pedestrian attribute recognition. Exploiting such data for ReID
purpose can be a way to alleviate the shortage of attribute annotations in ReID
case. In this work, an unsupervised domain adaptive ReID feature learning
framework is proposed to make full use of attribute annotations. We propose to
transfer attribute-related features from their original domain to the ReID one:
to this end, we introduce an adversarial discriminative domain adaptation
method in order to learn domain invariant features for encoding semantic
attributes. Experiments on three large-scale datasets validate the
effectiveness of the proposed ReID framework.Comment: 5 pages, accepted by ICIP201
Person Re-Identification without Identification via Event Anonymization
Wide-scale use of visual surveillance in public spaces puts individual
privacy at stake while increasing resource consumption (energy, bandwidth, and
computation). Neuromorphic vision sensors (event-cameras) have been recently
considered a valid solution to the privacy issue because they do not capture
detailed RGB visual information of the subjects in the scene. However, recent
deep learning architectures have been able to reconstruct images from event
cameras with high fidelity, reintroducing a potential threat to privacy for
event-based vision applications. In this paper, we aim to anonymize
event-streams to protect the identity of human subjects against such image
reconstruction attacks. To achieve this, we propose an end-to-end network
architecture jointly optimized for the twofold objective of preserving privacy
and performing a downstream task such as person ReId. Our network learns to
scramble events, enforcing the degradation of images recovered from the privacy
attacker. In this work, we also bring to the community the first ever
event-based person ReId dataset gathered to evaluate the performance of our
approach. We validate our approach with extensive experiments and report
results on the synthetic event data simulated from the publicly available
SoftBio dataset and our proposed Event-ReId dataset.Comment: Accepted at International Conference on Computer Vision (ICCV), 202
Left/Right Hand Segmentation in Egocentric Videos
Wearable cameras allow people to record their daily activities from a
user-centered (First Person Vision) perspective. Due to their favorable
location, wearable cameras frequently capture the hands of the user, and may
thus represent a promising user-machine interaction tool for different
applications. Existent First Person Vision methods handle hand segmentation as
a background-foreground problem, ignoring two important facts: i) hands are not
a single "skin-like" moving element, but a pair of interacting cooperative
entities, ii) close hand interactions may lead to hand-to-hand occlusions and,
as a consequence, create a single hand-like segment. These facts complicate a
proper understanding of hand movements and interactions. Our approach extends
traditional background-foreground strategies, by including a
hand-identification step (left-right) based on a Maxwell distribution of angle
and position. Hand-to-hand occlusions are addressed by exploiting temporal
superpixels. The experimental results show that, in addition to a reliable
left/right hand-segmentation, our approach considerably improves the
traditional background-foreground hand-segmentation
- …