12,790 research outputs found
Real time hand gesture recognition including hand segmentation and tracking
In this paper we present a system that performs automatic gesture recognition. The system consists of two main components: (i) A unified technique for segmentation and tracking of face and hands using a skin detection algorithm along with handling occlusion between skin objects to keep track of the status of the occluded parts. This is realized by combining 3 useful features, namely, color, motion and position. (ii) A static and dynamic gesture recognition system. Static gesture recognition is achieved using a robust hand shape classification, based on PCA subspaces, that is invariant to scale along with small translation and rotation transformations. Combining hand shape classification with position information and using DHMMs allows us to accomplish dynamic gesture recognition
The Whole World in Your Hand: Active and Interactive Segmentation
Object segmentation is a fundamental problem
in computer vision and a powerful resource for
development. This paper presents three embodied approaches to the visual segmentation of objects. Each approach to segmentation is aided
by the presence of a hand or arm in the proximity of the object to be segmented. The first
approach is suitable for a robotic system, where
the robot can use its arm to evoke object motion. The second method operates on a wearable system, viewing the world from a human's
perspective, with instrumentation to help detect
and segment objects that are held in the wearer's
hand. The third method operates when observing
a human teacher, locating periodic motion (finger/arm/object waving or tapping) and using it
as a seed for segmentation. We show that object segmentation can serve as a key resource for
development by demonstrating methods that exploit high-quality object segmentations to develop
both low-level vision capabilities (specialized feature detectors) and high-level vision capabilities
(object recognition and localization)
Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze
Unsupervised segmentation of action segments in egocentric videos is a
desirable feature in tasks such as activity recognition and content-based video
retrieval. Reducing the search space into a finite set of action segments
facilitates a faster and less noisy matching. However, there exist a
substantial gap in machine understanding of natural temporal cuts during a
continuous human activity. This work reports on a novel gaze-based approach for
segmenting action segments in videos captured using an egocentric camera. Gaze
is used to locate the region-of-interest inside a frame. By tracking two simple
motion-based parameters inside successive regions-of-interest, we discover a
finite set of temporal cuts. We present several results using combinations (of
the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains
egocentric videos depicting several daily-living activities. The quality of the
temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image
Processing Application
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Children, Humanoid Robots and Caregivers
This paper presents developmental learning on a humanoid robot from human-robot interactions. We consider in particular teaching humanoids as children during the child's Separation and Individuation developmental phase (Mahler, 1979). Cognitive development during this phase is characterized both by the child's dependence on her mother for learning while becoming awareness of her own individuality, and by self-exploration of her physical surroundings. We propose a learning framework for a humanoid robot inspired on such cognitive development
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Learning from Synthetic Humans
Estimating human pose, shape, and motion from images and videos are
fundamental challenges with many applications. Recent advances in 2D human pose
estimation use large amounts of manually-labeled training data for learning
convolutional neural networks (CNNs). Such data is time consuming to acquire
and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion
is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL
tasks): a new large-scale dataset with synthetically-generated but realistic
images of people rendered from 3D sequences of human motion capture data. We
generate more than 6 million frames together with ground truth pose, depth
maps, and segmentation masks. We show that CNNs trained on our synthetic
dataset allow for accurate human depth estimation and human part segmentation
in real RGB images. Our results and the new dataset open up new possibilities
for advancing person analysis using cheap and large-scale synthetic data.Comment: Appears in: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). 9 page
- âŠ