17 research outputs found
Detecting Hands in Egocentric Videos: Towards Action Recognition
Recently, there has been a growing interest in analyzing human daily
activities from data collected by wearable cameras. Since the hands are
involved in a vast set of daily tasks, detecting hands in egocentric images is
an important step towards the recognition of a variety of egocentric actions.
However, besides extreme illumination changes in egocentric images, hand
detection is not a trivial task because of the intrinsic large variability of
hand appearance. We propose a hand detector that exploits skin modeling for
fast hand proposal generation and Convolutional Neural Networks for hand
recognition. We tested our method on UNIGE-HANDS dataset and we showed that the
proposed approach achieves competitive hand detection results
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
Left/Right Hand Segmentation in Egocentric Videos
Wearable cameras allow people to record their daily activities from a
user-centered (First Person Vision) perspective. Due to their favorable
location, wearable cameras frequently capture the hands of the user, and may
thus represent a promising user-machine interaction tool for different
applications. Existent First Person Vision methods handle hand segmentation as
a background-foreground problem, ignoring two important facts: i) hands are not
a single "skin-like" moving element, but a pair of interacting cooperative
entities, ii) close hand interactions may lead to hand-to-hand occlusions and,
as a consequence, create a single hand-like segment. These facts complicate a
proper understanding of hand movements and interactions. Our approach extends
traditional background-foreground strategies, by including a
hand-identification step (left-right) based on a Maxwell distribution of angle
and position. Hand-to-hand occlusions are addressed by exploiting temporal
superpixels. The experimental results show that, in addition to a reliable
left/right hand-segmentation, our approach considerably improves the
traditional background-foreground hand-segmentation
Unsupervised Understanding of Location and Illumination Changes in Egocentric Videos
Wearable cameras stand out as one of the most promising devices for the
upcoming years, and as a consequence, the demand of computer algorithms to
automatically understand the videos recorded with them is increasing quickly.
An automatic understanding of these videos is not an easy task, and its mobile
nature implies important challenges to be faced, such as the changing light
conditions and the unrestricted locations recorded. This paper proposes an
unsupervised strategy based on global features and manifold learning to endow
wearable cameras with contextual information regarding the light conditions and
the location captured. Results show that non-linear manifold methods can
capture contextual patterns from global features without compromising large
computational resources. The proposed strategy is used, as an application case,
as a switching mechanism to improve the hand-detection problem in egocentric
videos.Comment: Submitted for publicatio
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio