Search CORE

4,127 research outputs found

Analysis of Hand Segmentation in the Wild

Author: Borji Ali
Khan Aisha Urooj
Publication venue
Publication date: 28/03/2018
Field of study

A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two new datasets: a) EgoYouTubeHands including egocentric videos containing hands in the wild, and b) HandOverFace to analyze the performance of our models in presence of similar appearance occlusions. We further explore whether conditional random fields can help refine generated hand segmentations. To demonstrate the benefit of accurate hand maps, we train a CNN for hand-based activity recognition and achieve higher accuracy when a CNN was trained using hand maps produced by the fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for fine-grained action recognition and show that an accuracy of 58.6% can be achieved by just looking at a single hand pose which is much better than the chance level (12.5%).Comment: Accepted at CVPR 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Detecting Hands in Egocentric Videos: Towards Action Recognition

Author: A Betancourt
A Cartas
J Zariffa
M Bolaños
M Everingham
O Russakovsky
SM Eshed Ohn-Bar
THC Nguyen
Publication venue
Publication date: 08/09/2017
Field of study

Recently, there has been a growing interest in analyzing human daily activities from data collected by wearable cameras. Since the hands are involved in a vast set of daily tasks, detecting hands in egocentric images is an important step towards the recognition of a variety of egocentric actions. However, besides extreme illumination changes in egocentric images, hand detection is not a trivial task because of the intrinsic large variability of hand appearance. We propose a hand detector that exploits skin modeling for fast hand proposal generation and Convolutional Neural Networks for hand recognition. We tested our method on UNIGE-HANDS dataset and we showed that the proposed approach achieves competitive hand detection results

arXiv.org e-Print Archive

Crossref

Summarizing First-Person Videos from Third Persons' Points of Views

Author: A Betancourt
AG Molino del
Elad Hoffer
Ke Zhang
M Bolanos
SJ Pan
VM Patel
YJ Lee
Publication venue
Publication date: 26/07/2018
Field of study

Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are presented.Comment: 16+10 pages, ECCV 201

arXiv.org e-Print Archive

Crossref

Forecasting Hands and Objects in Future Frames

Author: Fan Chenyou
Lee Jangwon
Ryoo Michael S.
Publication venue
Publication date: 23/08/2018
Field of study

This paper presents an approach to forecast future presence and location of human hands and objects. Given an image frame, the goal is to predict what objects will appear in the future frame (e.g., 5 seconds later) and where they will be located at, even when they are not visible in the current frame. The key idea is that (1) an intermediate representation of a convolutional object recognition model abstracts scene information in its frame and that (2) we can predict (i.e., regress) such representations corresponding to the future frames based on that of the current frame. We design a new two-stream convolutional neural network (CNN) architecture for videos by extending the state-of-the-art convolutional object detection network, and present a new fully convolutional regression network for predicting future scene representations. Our experiments confirm that combining the regressed future representation with our detection network allows reliable estimation of future hands and objects in videos. We obtain much higher accuracy compared to the state-of-the-art future object presence forecast method on a public dataset

arXiv.org e-Print Archive

Crossref

Recommended from our members

An interface to virtual environments for people who are blind using Wii technology - mental models and navigation

Author: Battersby S
Brown DJ
Evett L
Ridley A
Publication venue: 'Emerald'
Publication date: 27/08/2011
Field of study

Accessible games, both for serious and for entertainment purposes, would allow inclusion and participation for those with disabilities. Research into the development of accessible games, and accessible virtual environments, is discussed. Research into accessible Virtual Environments has demonstrated great potential for allowing people who are blind to explore new spaces, reducing their reliance on guides, and aiding development of more efficient spatial maps and strategies. Importantly, Lahav and Mioduser (2005, 2008) have demonstrated that, when exploring virtual spaces, people who are blind use more and different strategies than when exploring real physical spaces, and develop relatively accurate spatial representations of them. The present paper describes the design, development and evaluation of a system in which a virtual environment may be explored by people who are blind using Nintendo Wii devices, with auditory and haptic feedback. The nature of the various types of feedback is considered, with the aim of creating an intuitive and usable system. Using Wii technology has many advantages, not least of which are that it is mainstream, readily available and cheap. The potential of the system for exploration and navigation is demonstrated. Results strongly support the possibilities of the system for facilitating and supporting the construction of cognitive maps and spatial strategies. Intelligent support is discussed. Systems such as the present one will facilitate the development of accessible games, and thus enable Universal Design and accessible interactive technology to become more accepted and widespread

Nottingham Trent Institutional Repository (IRep)