4,127 research outputs found
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Detecting Hands in Egocentric Videos: Towards Action Recognition
Recently, there has been a growing interest in analyzing human daily
activities from data collected by wearable cameras. Since the hands are
involved in a vast set of daily tasks, detecting hands in egocentric images is
an important step towards the recognition of a variety of egocentric actions.
However, besides extreme illumination changes in egocentric images, hand
detection is not a trivial task because of the intrinsic large variability of
hand appearance. We propose a hand detector that exploits skin modeling for
fast hand proposal generation and Convolutional Neural Networks for hand
recognition. We tested our method on UNIGE-HANDS dataset and we showed that the
proposed approach achieves competitive hand detection results
Summarizing First-Person Videos from Third Persons' Points of Views
Video highlight or summarization is among interesting topics in computer
vision, which benefits a variety of applications like viewing, searching, or
storage. However, most existing studies rely on training data of third-person
videos, which cannot easily generalize to highlight the first-person ones. With
the goal of deriving an effective model to summarize first-person videos, we
propose a novel deep neural network architecture for describing and
discriminating vital spatiotemporal information across videos with different
points of view. Our proposed model is realized in a semi-supervised setting, in
which fully annotated third-person videos, unlabeled first-person videos, and a
small number of annotated first-person ones are presented during training. In
our experiments, qualitative and quantitative evaluations on both benchmarks
and our collected first-person video datasets are presented.Comment: 16+10 pages, ECCV 201
Forecasting Hands and Objects in Future Frames
This paper presents an approach to forecast future presence and location of
human hands and objects. Given an image frame, the goal is to predict what
objects will appear in the future frame (e.g., 5 seconds later) and where they
will be located at, even when they are not visible in the current frame. The
key idea is that (1) an intermediate representation of a convolutional object
recognition model abstracts scene information in its frame and that (2) we can
predict (i.e., regress) such representations corresponding to the future frames
based on that of the current frame. We design a new two-stream convolutional
neural network (CNN) architecture for videos by extending the state-of-the-art
convolutional object detection network, and present a new fully convolutional
regression network for predicting future scene representations. Our experiments
confirm that combining the regressed future representation with our detection
network allows reliable estimation of future hands and objects in videos. We
obtain much higher accuracy compared to the state-of-the-art future object
presence forecast method on a public dataset
Recommended from our members
An interface to virtual environments for people who are blind using Wii technology - mental models and navigation
Accessible games, both for serious and for entertainment purposes, would allow inclusion and participation for those with disabilities. Research into the development of accessible games, and accessible virtual environments, is discussed. Research into accessible Virtual Environments has demonstrated great potential for allowing people who are blind to explore new spaces, reducing their reliance on guides, and aiding development of more efficient spatial maps and strategies. Importantly, Lahav and Mioduser (2005, 2008) have demonstrated that, when exploring virtual spaces, people who are blind use more and different strategies than when exploring real physical spaces, and develop relatively accurate spatial representations of them. The present paper describes the design, development and evaluation of a system in which a virtual environment may be explored by people who are blind using Nintendo Wii devices, with auditory and haptic feedback. The nature of the various types of feedback is considered, with the aim of creating an intuitive and usable system. Using Wii technology has many advantages, not least of which are that it is mainstream, readily available and cheap. The potential of the system for exploration and navigation is demonstrated. Results strongly support the possibilities of the system for facilitating and supporting the construction of cognitive maps and spatial strategies. Intelligent support is discussed. Systems such as the present one will facilitate the development of accessible games, and thus enable Universal Design and accessible interactive technology to become more accepted and widespread
- …