70 research outputs found
When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data
Human action recognition from skeletal data is a hot research topic and
important in many open domain applications of computer vision, thanks to
recently introduced 3D sensors. In the literature, naive methods simply
transfer off-the-shelf techniques from video to the skeletal representation.
However, the current state-of-the-art is contended between to different
paradigms: kernel-based methods and feature learning with (recurrent) neural
networks. Both approaches show strong performances, yet they exhibit heavy, but
complementary, drawbacks. Motivated by this fact, our work aims at combining
together the best of the two paradigms, by proposing an approach where a
shallow network is fed with a covariance representation. Our intuition is that,
as long as the dynamics is effectively modeled, there is no need for the
classification network to be deep nor recurrent in order to score favorably. We
validate this hypothesis in a broad experimental analysis over 6 publicly
available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Curriculum Dropout
Dropout is a very effective way of regularizing neural networks.
Stochastically "dropping out" units with a certain probability discourages
over-specific co-adaptations of feature detectors, preventing overfitting and
improving network generalization. Besides, Dropout can be interpreted as an
approximate model aggregation technique, where an exponential number of smaller
networks are averaged in order to get a more powerful ensemble. In this paper,
we show that using a fixed dropout probability during training is a suboptimal
choice. We thus propose a time scheduling for the probability of retaining
neurons in the network. This induces an adaptive regularization scheme that
smoothly increases the difficulty of the optimization problem. This idea of
"starting easy" and adaptively increasing the difficulty of the learning
problem has its roots in curriculum learning and allows one to train better
models. Indeed, we prove that our optimization strategy implements a very
general curriculum scheme, by gradually adding noise to both the input and
intermediate feature representations within the network architecture.
Experiments on seven image classification datasets and different network
architectures show that our method, named Curriculum Dropout, frequently yields
to better generalization and, at worst, performs just as well as the standard
Dropout method.Comment: Accepted at ICCV (International Conference on Computer Vision) 201
Person Re-Identification without Identification via Event Anonymization
Wide-scale use of visual surveillance in public spaces puts individual
privacy at stake while increasing resource consumption (energy, bandwidth, and
computation). Neuromorphic vision sensors (event-cameras) have been recently
considered a valid solution to the privacy issue because they do not capture
detailed RGB visual information of the subjects in the scene. However, recent
deep learning architectures have been able to reconstruct images from event
cameras with high fidelity, reintroducing a potential threat to privacy for
event-based vision applications. In this paper, we aim to anonymize
event-streams to protect the identity of human subjects against such image
reconstruction attacks. To achieve this, we propose an end-to-end network
architecture jointly optimized for the twofold objective of preserving privacy
and performing a downstream task such as person ReId. Our network learns to
scramble events, enforcing the degradation of images recovered from the privacy
attacker. In this work, we also bring to the community the first ever
event-based person ReId dataset gathered to evaluate the performance of our
approach. We validate our approach with extensive experiments and report
results on the synthetic event data simulated from the publicly available
SoftBio dataset and our proposed Event-ReId dataset.Comment: Accepted at International Conference on Computer Vision (ICCV), 202
Excitation Dropout: Encouraging Plasticity in Deep Neural Networks
We propose a guided dropout regularizer for deep networks based on the
evidence of a network prediction defined as the firing of neurons in specific
paths. In this work, we utilize the evidence at each neuron to determine the
probability of dropout, rather than dropping out neurons uniformly at random as
in standard dropout. In essence, we dropout with higher probability those
neurons which contribute more to decision making at training time. This
approach penalizes high saliency neurons that are most relevant for model
prediction, i.e. those having stronger evidence. By dropping such high-saliency
neurons, the network is forced to learn alternative paths in order to maintain
loss minimization, resulting in a plasticity-like behavior, a characteristic of
human brains too. We demonstrate better generalization ability, an increased
utilization of network neurons, and a higher resilience to network compression
using several metrics over four image/video recognition benchmarks
- …